Exploration Diagnostic

Setup

First, we need to do some set up to analyze our data

Include dependencies

library(ggplot2)       # For plotting
library(tidyverse)     # For data wrangling
library(knitr)         # For making nice rmarkdown documents
library(cowplot)       # For theme
library(viridis)       # For color scale
library(RColorBrewer)  # For more color scales
library(rstatix)
library(ggsignif)      # For adding pairwise significance to plots
library(Hmisc)         # For bootstrapping condifence internvals
library(kableExtra)    # For displaying nice tables
source("https://gist.githubusercontent.com/benmarwick/2a1bb0133ff568cbe28d/raw/fb53bd97121f7f9ce947837ef1a4c65a73bffb3f/geom_flat_violin.R") # For raincloud plots
library(readr)        # For reading in data
library(stringr)      # For manipulating string data
library(ggpubr)       # For displaying correlation statistics on plots
library(infotheo)     # For causality analysis
library(scales)       # For displaying scales nicely in facetted plots
library(osfr)         # For downloading the data for this project

These analyses were conducted in the following computing environment:

print(version)
##                _                           
## platform       x86_64-pc-linux-gnu         
## arch           x86_64                      
## os             linux-gnu                   
## system         x86_64, linux-gnu           
## status                                     
## major          4                           
## minor          0.4                         
## year           2021                        
## month          02                          
## day            15                          
## svn rev        80002                       
## language       R                           
## version.string R version 4.0.4 (2021-02-15)
## nickname       Lost Library Book

Setup constants to be used across plots

# Labeler for stats annotations
p_label <- function(p_value) {
  threshold = 0.0001
  if (p_value < threshold) {
    return(paste0("p < ", threshold))
  } else {
    return(paste0("p = ", p_value))
  }
}

# Significance threshold
alpha <- 0.05

# Common graph variables
performance_ylim <- 1
coverage_ylim <- 1.0

####### misc #######
# Configure our default graphing theme
theme_set(theme_cowplot())

The data for this project are hosted on osf.io. The following chunk downloads them automatically if they haven’t already been downloaded.

# Read in data
osf_retrieve_file("esm4r") %>% osf_download(conflicts = "skip")  # Download data from osf
data_loc <- "final_exploration_diagnostic_data.csv"

data <- read_csv(data_loc, na=c("NONE", "NA", ""))

## Clean up data columns

# Make selection name column human readable
data <- data %>% mutate(selection_name = as.factor(case_when(
  selection_name == "EpsilonLexicase" ~ "Lexicase",
  TOUR_SIZE == 1 ~ "Random",
  selection_name == "Tournament" ~ "Tournament",
  selection_name == "FitnessSharing" ~ "Fitness Sharing",
  selection_name == "EcoEa" ~ "EcoEA"
)))

# Calculate performance statistics.
# Elite trait avg is the avg per-site performance of the best individual
data$elite_trait_avg <-
  data$ele_agg_per / data$OBJECTIVE_CNT
data$unique_start_positions_coverage <-
  data$uni_str_pos / data$OBJECTIVE_CNT

# Convert elite_trait_avg to percent of maximum possible
data$elite_trait_avg <- data$elite_trait_avg/data$TARGET

# Grab data from just the final time point
final_data <- filter(data, evaluations==max(data$evaluations))

Performance

For context, it’s important to know how each selection scheme performed on the exploration diagnostic.

Over time

First we look at the dynamics of performance over time.

Trait performance

Here, we plot average trait performance (i.e. fitness) over time for each selection scheme. We log the x-axis because Eco-EA gains fitness over a very long time scale, whereas the interesting dynamics for the other selection schemes occur relatively quickly.

ggplot(
    data,
    aes(
      x=gen,                 # Generations
      y=elite_trait_avg,     # Performance
      color=selection_name,  # Selection scheme
      fill=selection_name
    )
  ) +
  stat_summary(geom="line", fun=mean) + # Plot line showing mean for each selection scheme
  stat_summary(  # Add shading around each line indicating 95% confiedence interval
    geom="ribbon",
    fun.data="mean_cl_boot",
    fun.args=list(conf.int=0.95),
    alpha=0.2,
    linetype=0
  ) +
  scale_y_continuous(
    name="Average trait performance", # Set y axis title
    limits=c(0, performance_ylim)  # Set y axis range to include all possible performance values
  ) +
  scale_x_log10(  # Log x axis
    name="Generation" # Set x axis title
  ) +
  scale_color_discrete("Selection") + # Set legend title
  scale_fill_discrete("Selection")    # Set legend title

As observed by Hernandez et al. in their original paper on the exploration diagnostic [@hernandez_ExplorationExplorationMeasuring_2021], fitness in tournament selection initially increases quickly and then plateaus. Fitness in lexicase selection increases slightly slower but plateaus at a much higher value (nearly 100%). Fitness sharing behaves similarly to tournament selection, but maintains a slight upward trajectory (note that, because the x axis is on a log scale, this trajectory is very gradual). Eco-EA takes substantially longer to increase in fitness but ultimately surpasses fitness sharing and tournament selection. It is unclear whether it would pass lexicase selection if these runs were allowed to continue for slightly longer; they do not appear to have plateaued yet. We chose to cut them off at 500,000 generations due to resource constraints and the fact that the questions we’re asking here are not really about final fitness.

Activation position coverage

Out of curiosity, we also ran the analysis of unique activation positions present in the population, used by Hernandez et. al. This analysis tells us about the diversity of start positions for the coding region represented in the population. As the set of start positions in the population tends to represent a meaningful constraint on the number of paths through the fitness landscape that are currently accessible, this is in some sense a metric of useful diversity in the population

ggplot(data, aes(x=gen, y=unique_start_positions_coverage, color=selection_name, fill=selection_name)) +
  stat_summary(geom="line", fun=mean) +
  stat_summary(
    geom="ribbon",
    fun.data="mean_cl_boot",
    fun.args=list(conf.int=0.95),
    alpha=0.2,
    linetype=0
  ) +
  scale_y_continuous(
    name="Activation position coverage"
  ) +
  scale_x_log10(
    name="Generation"
  ) +
  scale_color_discrete("Selection")+
  scale_fill_discrete("Selection")

We see that lexicase selection maintains by far that largest number of unique start positions, even surpassing the number maintained by random drift. This suggests that lexicase selection is actively selecting for maintaining a diversity of start positions. Tournament selection and fitness sharing perform virtually identically, with Eco-EA falling in between.

Final

While trends over time are more informative, it can be hard to visualize the full distribution (particularly the extent of variation). Thus, we also conduct a more detailed analysis of performance at the final time point.

Trait performance

First we conduct statistics to identify which selection schemes are significantly different from each other.

# Compute manual labels for geom_signif
stat.test <- final_data %>%
  wilcox_test(elite_trait_avg ~ selection_name) %>%
  adjust_pvalue(method = "bonferroni") %>%  # Apply Bonferroni correction for multiple comparisons
  add_significance() %>%
  add_xy_position(x="selection_name",step.increase=1)
stat.test$label <- mapply(p_label,stat.test$p.adj)

Then we make raincloud plots [@allen_RaincloudPlotsMultiplatform_2021a] of each selection scheme.

elite_final_performance_fig <- ggplot(
    final_data,
    aes(
      x=selection_name,
      y=elite_trait_avg,
      fill=selection_name
    )
  ) +
  geom_flat_violin(
    position = position_nudge(x = .2, y = 0),
    alpha = .8,
    scale="width"
  ) +
  geom_point(
    mapping=aes(color=selection_name),
    position = position_jitter(width = .15),
    size = .5,
    alpha = 0.8
  ) +
  geom_boxplot(
    width = .1,
    outlier.shape = NA,
    alpha = 0.5
  ) +
  scale_y_continuous(
    name="Average trait performance",
    limits=c(0, performance_ylim)
  ) +
  scale_x_discrete(
    name="Selection"
  ) +
  scale_fill_discrete(
    name="Selection"
  ) +
  scale_color_discrete(
    name="Selection"
  ) + 
  theme(legend.position="none")
elite_final_performance_fig

These observations look fairly consistent with the timeseries plots.

Next, we output the results of our significance testing.

stat.test %>%
  kbl() %>%
  kable_styling(
    bootstrap_options = c(
      "striped",
      "hover",
      "condensed",
      "responsive"
    )
  ) %>%
  scroll_box(width="600px")
.y. group1 group2 n1 n2 statistic p p.adj p.adj.signif y.position groups xmin xmax label
elite_trait_avg EcoEA Fitness Sharing 50 50 1561 3.20e-02 3.20e-01 ns 1.922000 EcoEA , Fitness Sharing 1 2 p = 0.32
elite_trait_avg EcoEA Lexicase 50 47 60 0.00e+00 0.00e+00 **** 2.946444 EcoEA , Lexicase 1 3 p < 1e-04
elite_trait_avg EcoEA Random 50 50 2500 0.00e+00 0.00e+00 **** 3.970889 EcoEA , Random 1 4 p < 1e-04
elite_trait_avg EcoEA Tournament 50 50 1939 2.10e-06 2.07e-05 **** 4.995333 EcoEA , Tournament 1 5 p < 1e-04
elite_trait_avg Fitness Sharing Lexicase 50 47 593 2.69e-05 2.69e-04 *** 6.019778 Fitness Sharing, Lexicase 2 3 p = 0.000269
elite_trait_avg Fitness Sharing Random 50 50 2500 0.00e+00 0.00e+00 **** 7.044222 Fitness Sharing, Random 2 4 p < 1e-04
elite_trait_avg Fitness Sharing Tournament 50 50 1549 4.00e-02 4.00e-01 ns 8.068667 Fitness Sharing, Tournament 2 5 p = 0.4
elite_trait_avg Lexicase Random 47 50 2350 0.00e+00 0.00e+00 **** 9.093111 Lexicase, Random 3 4 p < 1e-04
elite_trait_avg Lexicase Tournament 47 50 2098 0.00e+00 0.00e+00 **** 10.117556 Lexicase , Tournament 3 5 p < 1e-04
elite_trait_avg Random Tournament 50 50 10 0.00e+00 0.00e+00 **** 11.142000 Random , Tournament 4 5 p < 1e-04

Fitness sharing did not perform significantly differently from Eco-EA or Tournament selection, but all other selection schemes are significantly different.

Final activation position Coverage

Now we do the same analysis for final activation position coverage.

First we calculate the statistics

# Compute manual labels for geom_signif
stat.test <- final_data %>%
  wilcox_test(unique_start_positions_coverage ~ selection_name) %>%
  adjust_pvalue(method = "bonferroni") %>%
  add_significance() %>%
  add_xy_position(x="selection_name",step.increase=1)
stat.test$manual_position <- stat.test$y.position * 1.05
stat.test$label <- mapply(p_label,stat.test$p.adj)

Then we make raincloud plots

unique_start_positions_coverage_final_fig <- ggplot(
    final_data,
    aes(
      x=selection_name,
      y=unique_start_positions_coverage,
      fill=selection_name
    )
  ) +
  geom_flat_violin(
    position = position_nudge(x = .2, y = 0),
    alpha = .8,
    scale="width"
  ) +
  geom_point(
    mapping=aes(color=selection_name),
    position = position_jitter(width = .15),
    size = .5,
    alpha = 0.8
  ) +
  geom_boxplot(
    width = .1,
    outlier.shape = NA,
    alpha = 0.5
  ) +
  scale_y_continuous(
    name="Activation position coverage",
    limits=c(0, coverage_ylim)
  ) +
  scale_x_discrete(
    name="Selection"
  ) +
  scale_fill_discrete(
    name="Selection"
  ) +
  scale_color_discrete(
    name="Selection"
  ) +
  theme(
    legend.position="none"
  )
unique_start_positions_coverage_final_fig

These also look unsurprising.

Lastly, we output the results of significance testing.

stat.test %>%
  kbl() %>%
  kable_styling(
    bootstrap_options = c(
      "striped",
      "hover",
      "condensed",
      "responsive"
    )
  ) %>%
  scroll_box(width="600px")
.y. group1 group2 n1 n2 statistic p p.adj p.adj.signif y.position groups xmin xmax manual_position label
unique_start_positions_coverage EcoEA Fitness Sharing 50 50 2392.5 0.000 0 **** 1.420000 EcoEA , Fitness Sharing 1 2 1.491000 p < 1e-04
unique_start_positions_coverage EcoEA Lexicase 50 47 0.0 0.000 0 **** 2.175556 EcoEA , Lexicase 1 3 2.284333 p < 1e-04
unique_start_positions_coverage EcoEA Random 50 50 339.0 0.000 0 **** 2.931111 EcoEA , Random 1 4 3.077667 p < 1e-04
unique_start_positions_coverage EcoEA Tournament 50 50 2387.0 0.000 0 **** 3.686667 EcoEA , Tournament 1 5 3.871000 p < 1e-04
unique_start_positions_coverage Fitness Sharing Lexicase 50 47 0.0 0.000 0 **** 4.442222 Fitness Sharing, Lexicase 2 3 4.664333 p < 1e-04
unique_start_positions_coverage Fitness Sharing Random 50 50 25.0 0.000 0 **** 5.197778 Fitness Sharing, Random 2 4 5.457667 p < 1e-04
unique_start_positions_coverage Fitness Sharing Tournament 50 50 1274.5 0.708 1 ns 5.953333 Fitness Sharing, Tournament 2 5 6.251000 p = 1
unique_start_positions_coverage Lexicase Random 47 50 2350.0 0.000 0 **** 6.708889 Lexicase, Random 3 4 7.044333 p < 1e-04
unique_start_positions_coverage Lexicase Tournament 47 50 2350.0 0.000 0 **** 7.464444 Lexicase , Tournament 3 5 7.837667 p < 1e-04
unique_start_positions_coverage Random Tournament 50 50 2475.5 0.000 0 **** 8.220000 Random , Tournament 4 5 8.631000 p < 1e-04

Phylogenetic diversity

Next, we analyze the behavior of phylogenetic diversity on the exploration diagnostic.

Relationship between different types of pylogenetic diversity

First, to get a big-picture overview, we make correlation matrices of all the different phylogenetic diversity metrics:

final_data %>% 
  transmute(MinPD=min_phenotype_pairwise_distance, 
            MeanPD=mean_phenotype_pairwise_distance, 
            MaxPD=max_phenotype_pairwise_distance, 
            VarPD=variance_phenotype_pairwise_distance, 
            MinED = min_phenotype_evolutionary_distinctiveness,
            MeanED= mean_phenotype_evolutionary_distinctiveness,
            MaxED=max_phenotype_evolutionary_distinctiveness,
            VarED=variance_phenotype_evolutionary_distinctiveness,
            PD=phenotype_current_phylogenetic_diversity,  # See Faith 1992
            MRCA=phen_mrca_depth,  # Phylogenetic depth of most recent common ancestor
            N=phen_num_taxa     # Number of taxonomically-distinct phenotypes
  ) %>%
  cor_mat() %>% 
  pull_lower_triangle() %>% 
  cor_plot()

However, these correlations may well vary by selection scheme, and even over time within a selection scheme. Let’s take a look at some scatter plots.

ggplot(
    data %>% filter(gen==500000),
    aes(
        y=mean_phenotype_pairwise_distance,
        x=variance_phenotype_pairwise_distance,
        color=selection_name,
        fill=selection_name
    )
  ) +
  geom_point() +
  scale_x_continuous(
        breaks = breaks_extended(4)
  ) + 
  facet_wrap(
      ~selection_name, scales="free"
  ) + 
  stat_smooth(
    method="lm"
  ) + 
  stat_cor(
    method="spearman", cor.coef.name = "rho", color="black"
  ) +
  theme(legend.position = "none")

Mean and variance of pairwise distances appears to correlate fairly closely with each other across selection schemes. The exception to this is tournament selection, where the range of values for both of these metrics are very small and the correlation is directly inverse. This is likely being driven by there being a small number of taxa that are mostly siblings of each other.

ggplot(
    data %>% filter(gen==500000),
    aes(
        y=mean_phenotype_pairwise_distance,
        x=max_phenotype_pairwise_distance,
        color=selection_name,
        fill=selection_name
    )
  ) +
  geom_point() +
  scale_x_continuous(
        breaks = breaks_extended(4)
  ) + 
  facet_wrap(
      ~selection_name, scales="free"
  ) + 
  stat_smooth(
    method="lm"
  ) + 
  stat_cor(
    method="spearman", cor.coef.name = "rho", color="black"
  ) +
  theme(legend.position = "none")

Maximum and mean pairwise distance also correlate pretty closely, with the exception again being tournament. These data shed further light on the previous graph as well - the maximum pariwise distance for tournament selection is always 2.

ggplot(
    data %>% filter(gen==500000),
    aes(
        y=variance_phenotype_pairwise_distance,
        x=max_phenotype_pairwise_distance,
        color=selection_name,
        fill=selection_name
    )
  ) +
  geom_point() +
  scale_x_continuous(
        breaks = breaks_extended(4)
  ) + 
  facet_wrap(
      ~selection_name, scales="free"
  ) + 
  stat_smooth(
    method="lm"
  ) + 
  stat_cor(
    method="spearman", cor.coef.name = "rho", color="black"
  ) +
  theme(legend.position = "none")

Max and variance of pairwise distances behave similarly.

And let’s check that it doesn’t look radically different early on in the run:

ggplot(
    data %>% filter(gen==5000),
    aes(
        y=mean_phenotype_pairwise_distance,
        x=variance_phenotype_pairwise_distance,
        color=selection_name,
        fill=selection_name
    )
  ) +
  geom_point() +
  scale_x_continuous(
        breaks = breaks_extended(4)
  ) + 
  facet_wrap(
      ~selection_name, scales="free"
  ) + 
  stat_smooth(
    method="lm"
  ) + 
  stat_cor(
    method="spearman", cor.coef.name = "rho", color="black"
  ) +
  theme(legend.position = "none")

The correlation between these two metrics is substantially weaker for Eco-Ea and lexicase selection at the early time point.

ggplot(
    data %>% filter(gen==5000),
    aes(
        y=mean_phenotype_pairwise_distance,
        x=max_phenotype_pairwise_distance,
        color=selection_name,
        fill=selection_name
    )
  ) +
  geom_point() +
  scale_x_continuous(
        breaks = breaks_extended(4)
  ) + 
  facet_wrap(
      ~selection_name, scales="free"
  ) + 
  stat_smooth(
    method="lm"
  ) + 
  stat_cor(
    method="spearman", cor.coef.name = "rho", color="black"
  ) +
  theme(legend.position = "none")

This relationship is also weaker, although not as much weaker.

ggplot(
    data %>% filter(gen==5000),
    aes(
        y=variance_phenotype_pairwise_distance,
        x=max_phenotype_pairwise_distance,
        color=selection_name,
        fill=selection_name
    )
  ) +
  geom_point() +
  scale_x_continuous(
        breaks = breaks_extended(4)
  ) + 
  facet_wrap(
      ~selection_name, scales="free"
  ) + 
  stat_smooth(
    method="lm"
  ) + 
  stat_cor(
    method="spearman", cor.coef.name = "rho", color="black"
  ) +
  theme(legend.position = "none")

Similarly, the correlation between max and variance of pariwise distances is weakend, but still robust. So the relationship between mean and variance of pairwise distances is the only one that is really weaker at the earlier time points. Understanding exactly what drives these discrepancies is a promising angle for future work, as it may shed further light on the dynamics occurring in the run. It is particularly interesting that EcoEA and Lexicase selection are the scenarios where they diverge.

Similarly, the evolutionary distinctiveness metrics look largely similar to each other, but let’s spot check that too.

ggplot(
    data %>% filter(gen==500000),
    aes(
        y=mean_phenotype_evolutionary_distinctiveness,
        x=variance_phenotype_evolutionary_distinctiveness,
        color=selection_name,
        fill=selection_name
    )
  ) +
  geom_point() +
  scale_x_continuous(
        breaks = breaks_extended(4)
  ) + 
  facet_wrap(
      ~selection_name, scales="free"
  ) + 
  stat_smooth(
    method="lm"
  ) + 
  stat_cor(
    method="spearman", cor.coef.name = "rho", color="black"
  ) +
  theme(legend.position = "none")

These aren’t the strongest correlations; different information is definitely being captured by each metric. However, they still correlate to a fair extent.

ggplot(
    data %>% filter(gen==500000),
    aes(
        y=mean_phenotype_evolutionary_distinctiveness,
        x=max_phenotype_evolutionary_distinctiveness,
        color=selection_name,
        fill=selection_name
    )
  ) +
  geom_point() +
  scale_x_continuous(
        breaks = breaks_extended(4)
  ) + 
  facet_wrap(
      ~selection_name, scales="free"
  ) + 
  stat_smooth(
    method="lm"
  ) + 
  stat_cor(
    method="spearman", cor.coef.name = "rho", color="black"
  ) +
  theme(legend.position = "none")

Again, for the most part there is a fairly high degree of correlation here.

ggplot(
    data %>% filter(gen==500000),
    aes(
        y=mean_phenotype_evolutionary_distinctiveness,
        x=min_phenotype_evolutionary_distinctiveness,
        color=selection_name,
        fill=selection_name
    )
  ) +
  geom_point() +
  scale_x_continuous(
        breaks = breaks_extended(4)
  ) + 
  facet_wrap(
      ~selection_name, scales="free"
  ) + 
  stat_smooth(
    method="lm"
  ) + 
  stat_cor(
    method="spearman", cor.coef.name = "rho", color="black"
  ) +
  theme(legend.position = "none")

And here too, although it’s getting kind of weak for lexicase.

And let’s check out an earlier time point

ggplot(
    data %>% filter(gen==5000),
    aes(
        y=mean_phenotype_evolutionary_distinctiveness,
        x=variance_phenotype_evolutionary_distinctiveness,
        color=selection_name,
        fill=selection_name
    )
  ) +
  geom_point() +
  scale_x_continuous(
        breaks = breaks_extended(4)
  ) + 
  facet_wrap(
      ~selection_name, scales="free"
  ) + 
  stat_smooth(
    method="lm"
  ) + 
  stat_cor(
    method="spearman", cor.coef.name = "rho", color="black"
  ) +
  theme(legend.position = "none")

The relationship between min and mean evolutionary distinctiveness holds up at the earlier time point for everything except fitness sharing.

ggplot(
    data %>% filter(gen==5000),
    aes(
        y=mean_phenotype_evolutionary_distinctiveness,
        x=max_phenotype_evolutionary_distinctiveness,
        color=selection_name,
        fill=selection_name
    )
  ) +
  geom_point() +
  scale_x_continuous(
        breaks = breaks_extended(4)
  ) + 
  facet_wrap(
      ~selection_name, scales="free"
  ) + 
  stat_smooth(
    method="lm"
  ) + 
  stat_cor(
    method="spearman", cor.coef.name = "rho", color="black"
  ) +
  theme(legend.position = "none")

And here, for everything except lexicase

ggplot(
    data %>% filter(gen==5000),
    aes(
        y=mean_phenotype_evolutionary_distinctiveness,
        x=min_phenotype_evolutionary_distinctiveness,
        color=selection_name,
        fill=selection_name
    )
  ) +
  geom_point() +
  scale_x_continuous(
        breaks = breaks_extended(4)
  ) + 
  facet_wrap(
      ~selection_name, scales="free"
  ) + 
  stat_smooth(
    method="lm"
  ) + 
  stat_cor(
    method="spearman", cor.coef.name = "rho", color="black"
  ) +
  theme(legend.position = "none")

These are relatively weak for the most part.

Looks like the evolutionary distinctiveness statistics capture more different information from each other than the pairwise distance statistics (in general the correlations are weaker, and there are more deviations from the pattern of positive correlations). However, in a lot of cases they still correlate. There is definitely more exploration to be done on what differences in these metrics imply, but for simplicity we’ll stick to one pairwise distance metric and one evolutionary distinctiveness metric.

Since it’s hard to get a gestalt impression from the scatterplots, let’s take a look at the correlation matrices for each selection scheme. Note that these are at the final time point, which clearly misses some dynamics that occurr earlier.

Random

final_data %>% 
  filter(selection_name=="Random") %>%
  transmute(MinPD=min_phenotype_pairwise_distance, 
            MeanPD=mean_phenotype_pairwise_distance, 
            MaxPD=max_phenotype_pairwise_distance, 
            VarPD=variance_phenotype_pairwise_distance, 
            MinED = min_phenotype_evolutionary_distinctiveness,
            MeanED= mean_phenotype_evolutionary_distinctiveness,
            MaxED=max_phenotype_evolutionary_distinctiveness,
            VarED=variance_phenotype_evolutionary_distinctiveness,
            PD=phenotype_current_phylogenetic_diversity,  # See Faith 1992
            MRCA=phen_mrca_depth,  # Phylogenetic depth of most recent common ancestor
            N=phen_num_taxa     # Number of taxonomically-distinct phenotypes
  ) %>%
  cor_mat() %>% 
  pull_lower_triangle() %>% 
  cor_plot()

In random selection, all the pairwise distance metrics correlate with each other, and also more weakly with some of the evolutionary distinctiveness metrics, but not max and variance (although they correlate strongly with each other). It is somewhat surprising that the depth of the most recent common ancestor and Faith’s Phylogenetic Diveristy metric have so little correlation with the others.

Tournament

final_data %>% 
  filter(selection_name=="Tournament") %>%
  transmute(MinPD=min_phenotype_pairwise_distance, 
            MeanPD=mean_phenotype_pairwise_distance, 
            MaxPD=max_phenotype_pairwise_distance, 
            VarPD=variance_phenotype_pairwise_distance, 
            MinED = min_phenotype_evolutionary_distinctiveness,
            MeanED= mean_phenotype_evolutionary_distinctiveness,
            MaxED=max_phenotype_evolutionary_distinctiveness,
            VarED=variance_phenotype_evolutionary_distinctiveness,
            PD=phenotype_current_phylogenetic_diversity,  # See Faith 1992
            MRCA=phen_mrca_depth,  # Phylogenetic depth of most recent common ancestor
            N=phen_num_taxa     # Number of taxonomically-distinct phenotypes
  ) %>%
  cor_mat() %>% 
  pull_lower_triangle() %>% 
  cor_plot()

Here, all the evolutionary distinctiveness metrics coreelate strongly with each other, but the pairiwse distance metrics less so (although this is likely an artifact of the extremely low phylogenetic diversity in tournament selection)

Fitness Sharing

final_data %>% 
  filter(selection_name=="Fitness Sharing") %>%
  transmute(MinPD=min_phenotype_pairwise_distance, 
            MeanPD=mean_phenotype_pairwise_distance, 
            MaxPD=max_phenotype_pairwise_distance, 
            VarPD=variance_phenotype_pairwise_distance, 
            MinED = min_phenotype_evolutionary_distinctiveness,
            MeanED= mean_phenotype_evolutionary_distinctiveness,
            MaxED=max_phenotype_evolutionary_distinctiveness,
            VarED=variance_phenotype_evolutionary_distinctiveness,
            PD=phenotype_current_phylogenetic_diversity,  # See Faith 1992
            MRCA=phen_mrca_depth,  # Phylogenetic depth of most recent common ancestor
            N=phen_num_taxa     # Number of taxonomically-distinct phenotypes
  ) %>%
  cor_mat() %>% 
  pull_lower_triangle() %>% 
  cor_plot()

Here, the pairwise distance metrics correlate with each other and the evolutionary distinctivenes metrics correlated with each other. There are some correlations between pairwise distance and evolutionary distinctiveness metrics, but they’re weaker.

Lexicase selection

final_data %>% 
  filter(selection_name=="Lexicase") %>%
  transmute(MinPD=min_phenotype_pairwise_distance, 
            MeanPD=mean_phenotype_pairwise_distance, 
            MaxPD=max_phenotype_pairwise_distance, 
            VarPD=variance_phenotype_pairwise_distance, 
            MinED = min_phenotype_evolutionary_distinctiveness,
            MeanED= mean_phenotype_evolutionary_distinctiveness,
            MaxED=max_phenotype_evolutionary_distinctiveness,
            VarED=variance_phenotype_evolutionary_distinctiveness,
            PD=phenotype_current_phylogenetic_diversity,  # See Faith 1992
            MRCA=phen_mrca_depth,  # Phylogenetic depth of most recent common ancestor
            N=phen_num_taxa     # Number of taxonomically-distinct phenotypes
  ) %>%
  cor_mat() %>% 
  pull_lower_triangle() %>% 
  cor_plot()

Again, the pairwise distance metrics correlate with each other but not wth the volutionary distinctivness metrics. The evolutionary distinctiveness metrics sort of correlate with each other.

Eco-EA

final_data %>% 
  filter(selection_name=="EcoEA") %>%
  transmute(MinPD=min_phenotype_pairwise_distance, 
            MeanPD=mean_phenotype_pairwise_distance, 
            MaxPD=max_phenotype_pairwise_distance, 
            VarPD=variance_phenotype_pairwise_distance, 
            MinED = min_phenotype_evolutionary_distinctiveness,
            MeanED= mean_phenotype_evolutionary_distinctiveness,
            MaxED=max_phenotype_evolutionary_distinctiveness,
            VarED=variance_phenotype_evolutionary_distinctiveness,
            PD=phenotype_current_phylogenetic_diversity,  # See Faith 1992
            MRCA=phen_mrca_depth,  # Phylogenetic depth of most recent common ancestor
            N=phen_num_taxa     # Number of taxonomically-distinct phenotypes
  ) %>%
  cor_mat() %>% 
  pull_lower_triangle() %>% 
  cor_plot()

In Eco-EA, there is an impressively strong positive correlation among all the phylodiversity metrics. Understanding what causes Eco-EA to deviate from lexicase selection (and to a lesser extent fitness sharing) in this way would be worthy of further research.

Over time

Mean pairwise distance

First we plot mean pairwise distance over time. We log the y axis because there is such variation in mean pairwise distance across selection schemes, and the x-axis for the same reason as before.

ggplot(
    data,
    aes(
      x=gen,
      y=mean_phenotype_pairwise_distance,
      color=selection_name,
      fill=selection_name
    )
  ) +
  stat_summary(geom="line", fun=mean) +
  stat_summary(
    geom="ribbon",
    fun.data="mean_cl_boot",
    fun.args=list(conf.int=0.95),
    alpha=0.2,
    linetype=0
  ) +
  scale_y_log10(
    name="Mean pairwise distance"
  ) +
  scale_x_log10(
    name="Generation"
  ) +
  scale_color_discrete("Selection") +
  scale_fill_discrete("Selection")

Lexicase selection maintains a monotonic increase in phylogenetic diversity over the course of the entire experiment. It likely never experiences a full coalescence event (where the most-recent common ancestor changes). Eco-EA nearly keeps pace with lexicase selection until towards the end, when its phylogenetic diversity crashes. This is likely the result of selective sweeps that begin to occur as the population discovers high fitness solutions. Fitness sharing shows a slight dip at the same time that its fitness plateaus (likely also the result of a selective sweep), but phylogenetic diversity recovers afterwards, making for a relatively constant level. over time. Tournament selection, on the other hand, maintains the same (low) level of phylogenetic diversity as fitness sharing, up until the point that fitness plateaus, at which point tournament selection’s phylodiversity drops to nearly 0. Interestingly, lexicase selection and Eco-EA both maintain more phylodiversity than random selection, whereas fitness sharing and tournament selection maintain less.

Mean evolutionary distinctiveness

For comparison, we make the same plot with mean evolutionary distinctiveness.

ggplot(
    data,
    aes(
      x=gen,
      y=mean_phenotype_evolutionary_distinctiveness,
      color=selection_name,
      fill=selection_name
    )
  ) +
  stat_summary(geom="line", fun=mean) +
  stat_summary(
    geom="ribbon",
    fun.data="mean_cl_boot",
    fun.args=list(conf.int=0.95),
    alpha=0.2,
    linetype=0
  ) +
  scale_y_log10(
    name="Mean evolutionary distinctiveness"
  ) +
  scale_x_log10(
    name="Generation"
  ) +
  scale_color_discrete("Selection") +
  scale_fill_discrete("Selection")

Interestingly, Fitness Sharing and tournament selection both start increasing in evolutionary distinctiveness only after their fitnesses have plateaued. This seems likely to be due to some sort of pathological behavior of mean evolutionary distinctiveness on small trees, but more investigation would be necessary to figure out exactly what’s going on. Trends in other selection schemes are largely similar.

Final

Next, we perform a more in-depth analysis of phylodiversity distributions at the final time point.

Mean pairwise distance

First, we test which selection schemes end up with significantly different final distributions of mean pairwise distance.

# Pairwise wilcoxon teset to determine which conditions are significantly different from each other
stat.test <- final_data %>%
  wilcox_test(mean_phenotype_pairwise_distance ~ selection_name) %>%
  adjust_pvalue(method = "bonferroni") %>%
  add_significance() %>%
  add_xy_position(x="selection_name",step.increase=1)
stat.test$label <- mapply(p_label,stat.test$p.adj)

# Output stats
stat.test %>%
  kbl() %>%
  kable_styling(
    bootstrap_options = c(
      "striped",
      "hover",
      "condensed",
      "responsive"
    )
  ) %>%
  scroll_box(width="600px")
.y. group1 group2 n1 n2 statistic p p.adj p.adj.signif y.position groups xmin xmax label
mean_phenotype_pairwise_distance EcoEA Fitness Sharing 50 50 1824.0 7.70e-05 7.70e-04 *** 49488.81 EcoEA , Fitness Sharing 1 2 p = 0.00077
mean_phenotype_pairwise_distance EcoEA Lexicase 50 47 227.0 0.00e+00 0.00e+00 **** 76981.49 EcoEA , Lexicase 1 3 p < 1e-04
mean_phenotype_pairwise_distance EcoEA Random 50 50 690.0 1.15e-04 1.15e-03 ** 104474.18 EcoEA , Random 1 4 p = 0.00115
mean_phenotype_pairwise_distance EcoEA Tournament 50 50 2500.0 0.00e+00 0.00e+00 **** 131966.86 EcoEA , Tournament 1 5 p < 1e-04
mean_phenotype_pairwise_distance Fitness Sharing Lexicase 50 47 0.0 0.00e+00 0.00e+00 **** 159459.54 Fitness Sharing, Lexicase 2 3 p < 1e-04
mean_phenotype_pairwise_distance Fitness Sharing Random 50 50 536.0 9.00e-07 8.70e-06 **** 186952.22 Fitness Sharing, Random 2 4 p < 1e-04
mean_phenotype_pairwise_distance Fitness Sharing Tournament 50 50 2232.5 0.00e+00 0.00e+00 **** 214444.90 Fitness Sharing, Tournament 2 5 p < 1e-04
mean_phenotype_pairwise_distance Lexicase Random 47 50 2350.0 0.00e+00 0.00e+00 **** 241937.58 Lexicase, Random 3 4 p < 1e-04
mean_phenotype_pairwise_distance Lexicase Tournament 47 50 2350.0 0.00e+00 0.00e+00 **** 269430.26 Lexicase , Tournament 3 5 p < 1e-04
mean_phenotype_pairwise_distance Random Tournament 50 50 2500.0 0.00e+00 0.00e+00 **** 296922.94 Random , Tournament 4 5 p < 1e-04

Looks like they are all significantly different from each other.

# Raincloud plot of final mean pairwise distance
final_phylogeny_fig <- ggplot(
    final_data,
    aes(
      x=selection_name,
      y=mean_phenotype_pairwise_distance,
      fill=selection_name
    )
  ) +
  geom_flat_violin(
    position = position_nudge(x = .2, y = 0),
    alpha = .8,
    scale="width"
  ) +
  geom_point(
    mapping=aes(color=selection_name),
    position = position_jitter(width = .15),
    size = .5,
    alpha = 0.8
  ) +
  geom_boxplot(
    width = .1,
    outlier.shape = NA,
    alpha = 0.5
  ) +
  scale_y_log10(
    name="Mean pairwise distance"
  ) +
  scale_x_discrete(
    name="Selection"
  ) +
  scale_fill_discrete(
    name="Selection"
  ) +
  scale_color_discrete(
    name="Selection"
  ) + 
  theme(legend.position = "none")
final_phylogeny_fig

This shows something interesting! Final phylogenetic diversity in Eco-EA is heavily bimodal. In later analysis, we will see that the runs with high phylogenetic diversity are the ones with lower fitness, suggesting that they have no yet experienced a selective sweep resulting from the discovery of a high-fitness solution.

Mean evolutionary distinctiveness

Tests for significant differences:

# Pairwise wilcoxon teset to determine which conditions are significantly different from each other
stat.test <- final_data %>%
  wilcox_test(mean_phenotype_evolutionary_distinctiveness ~ selection_name) %>%
  adjust_pvalue(method = "bonferroni") %>%
  add_significance() %>%
  add_xy_position(x="selection_name",step.increase=1)
stat.test$label <- mapply(p_label,stat.test$p.adj)

# Output stats

stat.test %>%
  kbl() %>%
  kable_styling(
    bootstrap_options = c(
      "striped",
      "hover",
      "condensed",
      "responsive"
    )
  ) %>%
  scroll_box(width="600px")
.y. group1 group2 n1 n2 statistic p p.adj p.adj.signif y.position groups xmin xmax label
mean_phenotype_evolutionary_distinctiveness EcoEA Fitness Sharing 50 50 469 1.00e-07 7.00e-07 **** 289111.7 EcoEA , Fitness Sharing 1 2 p < 1e-04
mean_phenotype_evolutionary_distinctiveness EcoEA Lexicase 50 47 0 0.00e+00 0.00e+00 **** 449625.8 EcoEA , Lexicase 1 3 p < 1e-04
mean_phenotype_evolutionary_distinctiveness EcoEA Random 50 50 711 2.05e-04 2.05e-03 ** 610140.0 EcoEA , Random 1 4 p = 0.00205
mean_phenotype_evolutionary_distinctiveness EcoEA Tournament 50 50 569 2.70e-06 2.72e-05 **** 770654.1 EcoEA , Tournament 1 5 p < 1e-04
mean_phenotype_evolutionary_distinctiveness Fitness Sharing Lexicase 50 47 100 0.00e+00 0.00e+00 **** 931168.2 Fitness Sharing, Lexicase 2 3 p < 1e-04
mean_phenotype_evolutionary_distinctiveness Fitness Sharing Random 50 50 2428 0.00e+00 0.00e+00 **** 1091682.4 Fitness Sharing, Random 2 4 p < 1e-04
mean_phenotype_evolutionary_distinctiveness Fitness Sharing Tournament 50 50 1614 1.20e-02 1.20e-01 ns 1252196.5 Fitness Sharing, Tournament 2 5 p = 0.12
mean_phenotype_evolutionary_distinctiveness Lexicase Random 47 50 2350 0.00e+00 0.00e+00 **** 1412710.6 Lexicase, Random 3 4 p < 1e-04
mean_phenotype_evolutionary_distinctiveness Lexicase Tournament 47 50 2255 0.00e+00 0.00e+00 **** 1573224.7 Lexicase , Tournament 3 5 p < 1e-04
mean_phenotype_evolutionary_distinctiveness Random Tournament 50 50 173 0.00e+00 0.00e+00 **** 1733738.9 Random , Tournament 4 5 p < 1e-04

Looks like everything except fitness sharing and tournament are significantly different from each other.

# Raincloud plot of final mean evolutionary distinctiveness
ggplot(
    final_data,
    aes(
      x=selection_name,
      y=mean_phenotype_evolutionary_distinctiveness,
      fill=selection_name
    )
  ) +
  geom_flat_violin(
    position = position_nudge(x = .2, y = 0),
    alpha = .8,
    scale="width"
  ) +
  geom_point(
    mapping=aes(color=selection_name),
    position = position_jitter(width = .15),
    size = .5,
    alpha = 0.8
  ) +
  geom_boxplot(
    width = .1,
    outlier.shape = NA,
    alpha = 0.5
  ) +
  scale_y_log10(
    name="Mean evolutionary distinctiveness"
  ) +
  scale_x_discrete(
    name="Selection"
  ) +
  scale_fill_discrete(
    name="Selection"
  ) +
  scale_color_discrete(
    name="Selection"
  ) + 
  theme(legend.position = "none")

Again, this looks fairly similar to MPD, except that fitness sharing and tournament are higher.

Phenotypic diversity

Now we analyze phenotypic (i.e. population-level) diversity. Here, we’re defining phenotypes to only include the sites that are actively contributing to fitness. So the phenotype of [1,4,2,6,5,4,3,6,7] would be [0,0,0,6,5,4,3,0,0]. Note that phylogenetic trees are also built using this conception of phenotypes.

Relationship between different types of phenotypic diversity

First, we should assess the extent to which different metrics of phenotypic diversity are capturing different information.

ggplot(
    data %>% filter(gen==500000),
    aes(
        y=phen_diversity,
        x=phen_num_taxa,
        color=selection_name,
        fill=selection_name
    )
  ) +
  geom_point() +
    scale_y_continuous(
        name="Phenotypic shannon diversity"
  ) +
  scale_x_continuous(
        name="Phenotypic richness",
        breaks = breaks_extended(4)
  ) + 
  facet_wrap(
      ~selection_name, scales="free"
  ) + 
  stat_smooth(
    method="lm"
  ) + 
  stat_cor(
    method="spearman", cor.coef.name = "rho", color="black"
  ) +
  theme(legend.position = "none")

Looks like at the final time point they are pretty much always closely correlated, although this correlation is weaker for lexicase selection than for other selection schemes.

Over time

Now we examine the behavior of each phenotypic diversity metric over time.

Richness

ggplot(
    data,
    aes(
      x=gen,
      y=phen_num_taxa,
      color=selection_name,
      fill=selection_name
    )
  ) +
  stat_summary(geom="line", fun=mean) +
  stat_summary(
    geom="ribbon",
    fun.data="mean_cl_boot",
    fun.args=list(conf.int=0.95),
    alpha=0.2,
    linetype=0
  ) +
  scale_y_continuous(
    name="Phenotypic richness"
  ) +
  scale_x_log10(
    name="Generation"
  ) +
  scale_color_discrete("Selection") + 
  scale_fill_discrete("Selection")

In contrast to the phylodiversity results, phenotypic richness in all selection schemes (even tournament selection) ultimately exceeds that of random selection. Eco-EA monotonically increases while lexicase selection reaches a maximum around the same time it reaches its fitness plateau. The only real similarity to the phylodiversity results is the behavior tournament selection and fitness sharing relative to each other.

Shannon diversity

We also looked at phenotypic shannon diversity:

ggplot(
    data,
    aes(
      x=gen,
      y=phen_diversity,
      color=selection_name,
      fill=selection_name
    )
  ) +
  stat_summary(geom="line", fun=mean) +
  stat_summary(
    geom="ribbon",
    fun.data="mean_cl_boot",
    fun.args=list(conf.int=0.95),
    alpha=0.2,
    linetype=0
  ) +
  scale_y_continuous(
    name="Phenotypic Shannon entropy"
  ) +
  scale_x_log10(
    name="Generation"
  ) +
  scale_color_discrete("Selection") + 
  scale_fill_discrete("Selection")

This is more different from the richness results than we might have expected based on the correlation at the final time point. There is a much more pronounced drop off in Shannon entropy for tournament and fitness sharing than in richness. This difference is probably driven by the fact that, after plateauing, these selection schemes are likely both at mutation-selection balance. Thus, there are very many single-mutant phenotypes with only one copy in the population. These do not contribute much to Shannon entropy or phylogenetic diversity, but it does show up in richness.

We also see that, after plataeuing, lexicase selection does actually start to decrease in Shannon entropy, but slowly. Similarly, in Eco-EA, the increase in Shannon entropy towards the end is much more modest than the increase in richness.

Final

Now we assess the final phenotypic diversity

Richness

Hypothesis-testing differences between groups:

# Determine which conditions are significantly diferrent from each other
stat.test <- final_data %>%
  wilcox_test(phen_num_taxa ~ selection_name) %>%
  adjust_pvalue(method = "bonferroni") %>%
  add_significance() %>%
  add_xy_position(x="selection_name",step.increase=1)
stat.test$label <- mapply(p_label,stat.test$p.adj)

stat.test %>%
  kbl() %>%
  kable_styling(
    bootstrap_options = c(
      "striped",
      "hover",
      "condensed",
      "responsive"
    )
  ) %>%
  scroll_box(width="600px")
.y. group1 group2 n1 n2 statistic p p.adj p.adj.signif y.position groups xmin xmax label
phen_num_taxa EcoEA Fitness Sharing 50 50 2249.0 0.00e+00 0.0e+00 **** 326 EcoEA , Fitness Sharing 1 2 p < 1e-04
phen_num_taxa EcoEA Lexicase 50 47 2319.0 0.00e+00 0.0e+00 **** 456 EcoEA , Lexicase 1 3 p < 1e-04
phen_num_taxa EcoEA Random 50 50 2500.0 0.00e+00 0.0e+00 **** 586 EcoEA , Random 1 4 p < 1e-04
phen_num_taxa EcoEA Tournament 50 50 2378.5 0.00e+00 0.0e+00 **** 716 EcoEA , Tournament 1 5 p < 1e-04
phen_num_taxa Fitness Sharing Lexicase 50 47 1428.0 6.80e-02 6.8e-01 ns 846 Fitness Sharing, Lexicase 2 3 p = 0.68
phen_num_taxa Fitness Sharing Random 50 50 1973.0 6.00e-07 6.3e-06 **** 976 Fitness Sharing, Random 2 4 p < 1e-04
phen_num_taxa Fitness Sharing Tournament 50 50 1585.0 2.10e-02 2.1e-01 ns 1106 Fitness Sharing, Tournament 2 5 p = 0.21
phen_num_taxa Lexicase Random 47 50 2339.5 0.00e+00 0.0e+00 **** 1236 Lexicase, Random 3 4 p < 1e-04
phen_num_taxa Lexicase Tournament 47 50 1359.0 1.85e-01 1.0e+00 ns 1366 Lexicase , Tournament 3 5 p = 1
phen_num_taxa Random Tournament 50 50 797.0 2.00e-03 2.0e-02
1496 Random , Tournament 4 5 p = 0.02

Raincloud plot:

# Raincloud plot of final phenotypic diversity
final_phenotypic_fig <- ggplot(
    final_data,
    aes(
      x=selection_name,
      y=phen_num_taxa,
      fill=selection_name
    )
  ) +
  geom_flat_violin(
    position = position_nudge(x = .2, y = 0),
    alpha = .8,
    scale="width"
  ) +
  geom_point(
    mapping=aes(color=selection_name),
    position = position_jitter(width = .15),
    size = .5,
    alpha = 0.8
  ) +
  geom_boxplot(
    width = .1,
    outlier.shape = NA,
    alpha = 0.5
  ) +
  scale_y_continuous(
    name="Phenotypic Richness"
  ) +
  scale_x_discrete(
    name="Selection"
  ) +
  scale_fill_discrete(
    name="Selection"
  ) +
  scale_color_discrete(
    name="Selection"
  ) +
  theme(legend.position = "none")
final_phenotypic_fig

Nothing particularly suprising here, but we should note that, based on the over time plot, this would look a lot different if we had selected a different time point.

Shannon diversity

# Determine which conditions are significantly diferrent from each other
stat.test <- final_data %>%
  wilcox_test(phen_diversity~ selection_name) %>%
  adjust_pvalue(method = "bonferroni") %>%
  add_significance() %>%
  add_xy_position(x="selection_name",step.increase=1)
stat.test$label <- mapply(p_label,stat.test$p.adj)

stat.test %>%
  kbl() %>%
  kable_styling(
    bootstrap_options = c(
      "striped",
      "hover",
      "condensed",
      "responsive"
    )
  ) %>%
  scroll_box(width="600px")
.y. group1 group2 n1 n2 statistic p p.adj p.adj.signif y.position groups xmin xmax label
phen_diversity EcoEA Fitness Sharing 50 50 2478.0 0.00e+00 0.00e+00 **** 9.602 EcoEA , Fitness Sharing 1 2 p < 1e-04
phen_diversity EcoEA Lexicase 50 47 1772.0 1.66e-05 1.66e-04 *** 13.012 EcoEA , Lexicase 1 3 p = 0.000166
phen_diversity EcoEA Random 50 50 2089.5 0.00e+00 1.00e-07 **** 16.422 EcoEA , Random 1 4 p < 1e-04
phen_diversity EcoEA Tournament 50 50 2496.0 0.00e+00 0.00e+00 **** 19.832 EcoEA , Tournament 1 5 p < 1e-04
phen_diversity Fitness Sharing Lexicase 50 47 0.0 0.00e+00 0.00e+00 **** 23.242 Fitness Sharing, Lexicase 2 3 p < 1e-04
phen_diversity Fitness Sharing Random 50 50 0.0 0.00e+00 0.00e+00 **** 26.652 Fitness Sharing, Random 2 4 p < 1e-04
phen_diversity Fitness Sharing Tournament 50 50 1856.0 2.99e-05 2.99e-04 *** 30.062 Fitness Sharing, Tournament 2 5 p = 0.000299
phen_diversity Lexicase Random 47 50 1714.0 1.01e-04 1.01e-03 ** 33.472 Lexicase, Random 3 4 p = 0.00101
phen_diversity Lexicase Tournament 47 50 2350.0 0.00e+00 0.00e+00 **** 36.882 Lexicase , Tournament 3 5 p < 1e-04
phen_diversity Random Tournament 50 50 2500.0 0.00e+00 0.00e+00 **** 40.292 Random , Tournament 4 5 p < 1e-04

Interestingly, the final shannon diversity values of different selection schemes are much more distinguishable from each other than the final richness values.

# Raincloud plot of final phenotypic diversity
ggplot(
    final_data,
    aes(
      x=selection_name,
      y=phen_num_taxa,
      fill=selection_name
    )
  ) +
  geom_flat_violin(
    position = position_nudge(x = .2, y = 0),
    alpha = .8,
    scale="width"
  ) +
  geom_point(
    mapping=aes(color=selection_name),
    position = position_jitter(width = .15),
    size = .5,
    alpha = 0.8
  ) +
  geom_boxplot(
    width = .1,
    outlier.shape = NA,
    alpha = 0.5
  ) +
  scale_y_continuous(
    name="Phenotypic Shannon Diversity"
  ) +
  scale_x_discrete(
    name="Selection"
  ) +
  scale_fill_discrete(
    name="Selection"
  ) +
  scale_color_discrete(
    name="Selection"
  ) +
  theme(legend.position = "none")

Again, we know from the time series plots that these relative relationships varied a lot over time. Eco-EA is only higher than lexicase selection at the very end.

Relationship between phenotypic and phylogenetic diversity

Now, we can finally begin to address the main questions. Do phenotypic diversity and phylogenetic diversity capture different information?

Phenotypic richness vs. mean pairwise distance

ggplot(
    data %>% filter(gen==500000),
    aes(
        y=phen_num_taxa,
        x=mean_phenotype_pairwise_distance,
        color=selection_name,
        fill=selection_name
    )
  ) +
  geom_point() +
    scale_y_continuous(
        name="Phenotypic richness"
  ) +
  scale_x_continuous(
        name="Mean pairwise distance",
        breaks = breaks_extended(4)
  ) + 
  facet_wrap(
      ~selection_name, scales="free"
  ) + 
  stat_smooth(
    method="lm"
  ) + 
  stat_cor(
    method="spearman", cor.coef.name = "rho", color="black"
  ) +
  theme(legend.position = "none")

The linear models for some of these are questionable, but the Spearman correlation coefficient should be fine because it does not require linearity. Only tournament selection is significantly different from 0. Eco-EA is even negative (although non-significant), which is probably driven by the fact that there are really two groups of runs in Eco-EA: those that have found a good solution and had their diversity crash, and those that haven’t yet.

Lets take a look at how each of those groups behave (we’ll show fitness as color, just to better understand what’s happening):

ggplot(
    data %>% filter(gen==500000, selection_name == "EcoEA"),
    aes(
        y=phen_num_taxa,
        x=mean_phenotype_pairwise_distance,
        color = elite_trait_avg
    )
  ) +
  geom_point() +
    scale_y_continuous(
        name="Phenotypic richness"
  ) +
  scale_x_continuous(
        name="Mean pairwise distance",
        breaks = breaks_extended(4)
  ) + 
  facet_wrap(
      ~mean_phenotype_pairwise_distance > 100, scales="free"
  ) + 
  stat_smooth(
    method="lm"
  ) + 
  stat_cor(
    method="spearman", cor.coef.name = "rho", color="black"
  ) +
  scale_color_continuous(type="viridis")

Phenotypic richness vs. mean evolutionary distinctiveness

ggplot(
    data %>% filter(gen==500000),
    aes(
        y=phen_num_taxa,
        x=mean_phenotype_evolutionary_distinctiveness,
        color=selection_name,
        fill=selection_name
    )
  ) +
  geom_point() +
    scale_y_continuous(
        name="Phenotypic richness"
  ) +
  scale_x_continuous(
        name="Mean evolutionary distinctiveness",
        breaks = breaks_extended(4)
  ) + 
  facet_wrap(
      ~selection_name, scales="free"
  ) + 
  stat_smooth(
    method="lm"
  ) + 
  stat_cor(
    method="spearman", cor.coef.name = "rho", color="black"
  ) +
  theme(legend.position = "none")

These are mostly significant, but negative. That’s interesting and worthy of further exploration (but it’s a little beyond the scope of this analysis).

Phenotypic shannon diversity vs. mean pairwise distance

ggplot(
    data %>% filter(gen==500000),
    aes(
        y=phen_diversity,
        x=mean_phenotype_pairwise_distance,
        color=selection_name,
        fill=selection_name
    )
  ) +
  geom_point() +
    scale_y_continuous(
        name="Phenotypic shannon diversity"
  ) +
  scale_x_continuous(
        name="Mean pairwise distance",
        breaks = breaks_extended(4)
  ) + 
  facet_wrap(
      ~selection_name, scales="free"
  ) + 
  stat_smooth(
    method="lm"
  ) + 
  stat_cor(
    method="spearman", cor.coef.name = "rho", color="black"
  ) +
  theme(legend.position = "none")

Again, these are mostly not significant (although fitness sharing notably now is).

Phenotypic shannon diversity vs. mean evolutionary distinctiveness

ggplot(
    data %>% filter(gen==500000),
    aes(
        y=phen_diversity,
        x=mean_phenotype_evolutionary_distinctiveness,
        color=selection_name,
        fill=selection_name
    )
  ) +
  geom_point() +
    scale_y_continuous(
        name="Phenotypic shannon diversity"
  ) +
  scale_x_continuous(
        name="Mean evolutionary distinctiveness",
        breaks = breaks_extended(4)
  ) + 
  facet_wrap(
      ~selection_name, scales="free"
  ) + 
  stat_smooth(
    method="lm"
  ) + 
  stat_cor(
    method="spearman", cor.coef.name = "rho", color="black"
  ) +
  theme(legend.position = "none")

Again, all significant but negative. From this we can conclude that there is substantially more overlap between the information captured by phenotypic diversity and mean evolutionary disinctiveness than between phenotypic diversity metrics and mean pairwise distance. However, thus information has wildly different implications in the two contexts. Moreover, the long term trends are still very different. Thus, we still feel confident in saying that these metrics are meaningfully different. More investigation here may be worthwhile in the future.

Overall, based on these end of time analyses and qualitative comparison of the temporal graphs, we conclude that phenotypic and phylogenetic diversity are meaningfully different from each other in the context of evolutionary computation. In retrospect, this finding may seem obvious. However a lot of people tend to assume that these two classes of diversity are closely related.

Relationship between diversity and success

At last, we can finally assess whether diversity leads to actually solving problems! We start out by looking at correlations between diversity success across a few different time points, although we ultimately conclude that this approach is not particularly informative.

Very early in run

Mean pairwise distance

ggplot(
    data %>% filter(gen==25000),
    aes(
        y=elite_trait_avg,
        x=mean_phenotype_pairwise_distance,
        color=selection_name,
        fill=selection_name
    )
  ) +
  geom_point() +
    scale_y_continuous(
        name="Average trait performance"
  ) +
  scale_x_continuous(
        name="Mean pairwise distance"
  ) + 
  facet_wrap(
      ~selection_name, scales="free"
  ) + 
  stat_smooth(
    method="lm"
  ) + 
  stat_cor(
    method="spearman", cor.coef.name = "rho", color="black"
  ) +
  theme(legend.position = "none")

Mean evolutionary distinctiveness

ggplot(
    data %>% filter(gen==25000),
    aes(
        y=elite_trait_avg,
        x=mean_phenotype_evolutionary_distinctiveness,
        color=selection_name,
        fill=selection_name
    )
  ) +
  geom_point() +
    scale_y_continuous(
        name="Average trait performance"
  ) +
  scale_x_continuous(
        name="Mean evolutionary distinctiveness"
  ) + 
  facet_wrap(
      ~selection_name, scales="free"
  ) + 
  stat_smooth(
    method="lm"
  ) + 
  stat_cor(
    method="spearman", cor.coef.name = "rho", color="black"
  ) +
  theme(legend.position = "none")

Richness

ggplot(
    data %>% filter(gen==25000),
    aes(
        y=elite_trait_avg,
        x=phen_num_taxa,
        color=selection_name,
        fill=selection_name
    )
  ) +
  geom_point() +
    scale_y_continuous(
        name="Average trait performance"
  ) +
  scale_x_continuous(
        name="Phenotypic richness"
  ) + 
  facet_wrap(
      ~selection_name, scales="free"
  ) + 
  stat_smooth(
    method="lm"
  ) + 
  stat_cor(
    method="spearman", cor.coef.name = "rho", color="black"
  ) +
  theme(legend.position = "none")

Shannon diversity

ggplot(
    data %>% filter(gen==25000),
    aes(
        y=elite_trait_avg,
        x=phen_diversity,
        color=selection_name,
        fill=selection_name
    )
  ) +
  geom_point() +
    scale_y_continuous(
        name="Average trait performance"
  ) +
  scale_x_continuous(
        name="Phenotypic Shannon diversity"
  ) + 
  facet_wrap(
      ~selection_name, scales="free"
  ) + 
  stat_smooth(
    method="lm"
  ) + 
  stat_cor(
    method="spearman", cor.coef.name = "rho", color="black"
  ) +
  theme(legend.position = "none")

Middle of run

Mean pairwise distance

ggplot(
    data %>% filter(gen==50000),
    aes(
        y=elite_trait_avg,
        x=mean_phenotype_pairwise_distance,
        color=selection_name,
        fill=selection_name
    )
  ) +
  geom_point() +
    scale_y_continuous(
        name="Average trait performance"
  ) +
  scale_x_continuous(
        name="Mean pairwise distance"
  ) + 
  facet_wrap(
      ~selection_name, scales="free"
  ) + 
  stat_smooth(
    method="lm"
  ) + 
  stat_cor(
    method="spearman", cor.coef.name = "rho", color="black"
  ) +
  theme(legend.position = "none")

Mean evolutionary distinctiveness

ggplot(
    data %>% filter(gen==50000),
    aes(
        y=elite_trait_avg,
        x=mean_phenotype_evolutionary_distinctiveness,
        color=selection_name,
        fill=selection_name
    )
  ) +
  geom_point() +
    scale_y_continuous(
        name="Average trait performance"
  ) +
  scale_x_continuous(
        name="Mean evolutionary distinctiveness"
  ) + 
  facet_wrap(
      ~selection_name, scales="free"
  ) + 
  stat_smooth(
    method="lm"
  ) + 
  stat_cor(
    method="spearman", cor.coef.name = "rho", color="black"
  ) +
  theme(legend.position = "none")

Richness

ggplot(
    data %>% filter(gen==50000),
    aes(
        y=elite_trait_avg,
        x=phen_num_taxa,
        color=selection_name,
        fill=selection_name
    )
  ) +
  geom_point() +
    scale_y_continuous(
        name="Average trait performance"
  ) +
  scale_x_continuous(
        name="Phenotypic richness"
  ) + 
  facet_wrap(
      ~selection_name, scales="free"
  ) + 
  stat_smooth(
    method="lm"
  ) + 
  stat_cor(
    method="spearman", cor.coef.name = "rho", color="black"
  ) +
  theme(legend.position = "none")

Shannon diversity

ggplot(
    data %>% filter(gen==50000),
    aes(
        y=elite_trait_avg,
        x=phen_diversity,
        color=selection_name,
        fill=selection_name
    )
  ) +
  geom_point() +
    scale_y_continuous(
        name="Average trait performance"
  ) +
  scale_x_continuous(
        name="Phenotypic Shannon diversity"
  ) + 
  facet_wrap(
      ~selection_name, scales="free"
  ) + 
  stat_smooth(
    method="lm"
  ) + 
  stat_cor(
    method="spearman", cor.coef.name = "rho", color="black"
  ) +
  theme(legend.position = "none")

End of run

Mean pairwise distance

ggplot(
    final_data,
    aes(
        y=elite_trait_avg,
        x=mean_phenotype_pairwise_distance,
        color=selection_name,
        fill=selection_name
    )
  ) +
  geom_point() +
    scale_y_continuous(
        name="Average trait performance"
  ) +
  scale_x_continuous(
        name="Mean pairwise distance"
  ) + 
  facet_wrap(
      ~selection_name, scales="free"
  ) + 
  stat_smooth(
    method="lm"
  ) + 
  stat_cor(
    method="spearman", cor.coef.name = "rho", color="black"
  ) +
  theme(legend.position = "none")

Mean evolutionary distinctiveness

ggplot(
    final_data,
    aes(
        y=elite_trait_avg,
        x=mean_phenotype_evolutionary_distinctiveness,
        color=selection_name,
        fill=selection_name
    )
  ) +
  geom_point() +
    scale_y_continuous(
        name="Average trait performance"
  ) +
  scale_x_continuous(
        name="Mean evolutionary distinctiveness"
  ) + 
  facet_wrap(
      ~selection_name, scales="free"
  ) + 
  stat_smooth(
    method="lm"
  ) + 
  stat_cor(
    method="spearman", cor.coef.name = "rho", color="black"
  ) +
  theme(legend.position = "none")

Richness

ggplot(
    final_data,
    aes(
        y=elite_trait_avg,
        x=phen_num_taxa,
        color=selection_name,
        fill=selection_name
    )
  ) +
  geom_point() +
    scale_y_continuous(
        name="Average trait performance"
  ) +
  scale_x_continuous(
        name="Phenotypic richness"
  ) + 
  facet_wrap(
      ~selection_name, scales="free"
  ) + 
  stat_smooth(
    method="lm"
  ) + 
  stat_cor(
    method="spearman", cor.coef.name = "rho", color="black"
  ) +
  theme(legend.position = "none")

Shannon diversity

ggplot(
    final_data,
    aes(
        y=elite_trait_avg,
        x=phen_diversity,
        color=selection_name,
        fill=selection_name
    )
  ) +
  geom_point() +
    scale_y_continuous(
        name="Average trait performance"
  ) +
  scale_x_continuous(
        name="Phenotypic Shannon diversity"
  ) + 
  facet_wrap(
      ~selection_name, scales="free"
  ) + 
  stat_smooth(
    method="lm"
  ) + 
  stat_cor(
    method="spearman", cor.coef.name = "rho", color="black"
  ) +
  theme(legend.position = "none")

Causality analysis

Ultimately, it’s hard to draw useful inferences either from these single time point analyses or from comparison of line plots. Due to the feedbacks between diversity and performance, there may be a time delay in when one affects the other.

To analyze the drivers of this feedback loop in a more rigorous way, we turn to Transfer Entropy as a metric of Granger Causality. For a more thorough description, see the paper associated with this supplement.

Setup

First let’s define a function we’ll use to calculate and output significance and effect size for these results:

transfer_entropy_stats <- function(res) {
  stat.test <- res %>%
    group_by(selection_name, offset) %>%
    wilcox_test(value ~ Type) %>%
    adjust_pvalue(method = "bonferroni") %>%
    add_significance() 
  stat.test$label <- mapply(p_label,stat.test$p.adj)
  
  # Calculate effect sizes for these differences
  effect_sizes <- res %>%
    group_by(selection_name, offset) %>%
    wilcox_effsize(value ~ Type)
  
  stat.test$effsize <- effect_sizes$effsize
  stat.test$magnitude <- effect_sizes$magnitude
  
  stat.test %>%
    kbl() %>%
    kable_styling(
      bootstrap_options = c(
        "striped",
        "hover",
        "condensed",
        "responsive"
      )
    ) %>%
    scroll_box(width="800px")
}

Transfer entropy from diversity to fitness

First, we calculate the information that each type of diversity gives us about future fitness.

Max pairwise distance vs. phenotypic richness

We plot the differences in transfer entropy from phylogenetic diversity to fitness and from phenotypic diversity to fitness, at a range of different lags.

# Calculate transfer entropy for max pairwise distance
# Time points are 10 generations, so calculating lag 1 gives us lag 10
res <- data %>% group_by(directory, selection_name) %>%
summarise(
  fit_phylo_10 =     condinformation(discretize(elite_trait_avg), # TE phylo -> fit, lag 10
                                     discretize(lag(max_phenotype_pairwise_distance, 1)), 
                                     discretize(lag(elite_trait_avg, 1))),
  fit_phylo_100 =    condinformation(discretize(elite_trait_avg), # TE phylo -> fit, lag 100
                                     discretize(lag(max_phenotype_pairwise_distance, 10)), 
                                     discretize(lag(elite_trait_avg, 10))),
  fit_phylo_1000 =   condinformation(discretize(elite_trait_avg), # TE phylo -> fit, lag 1000
                                     discretize(lag(max_phenotype_pairwise_distance, 100)), 
                                     discretize(lag(elite_trait_avg, 100))),
  fit_phylo_10000 =  condinformation(discretize(elite_trait_avg), # TE phylo -> fit, lag 10000
                                     discretize(lag(max_phenotype_pairwise_distance, 1000)), 
                                     discretize(lag(elite_trait_avg, 1000))),
  fit_phylo_100000 = condinformation(discretize(elite_trait_avg), # TE phylo -> fit, lag 100000
                                     discretize(lag(max_phenotype_pairwise_distance, 10000)), 
                                     discretize(lag(elite_trait_avg, 10000))),  
  fit_fit_10000 =    condinformation(discretize(elite_trait_avg), # Mutual info btwn. fit and lagged fit
                                     discretize(lag(elite_trait_avg, 1000)), 
                                     discretize(lag(max_phenotype_pairwise_distance, 1000))),
  fit_fit_100000 =   condinformation(discretize(elite_trait_avg), # Mutual info btwn. fit and lagged fit
                                     discretize(lag(elite_trait_avg, 10000)), 
                                     discretize(lag(max_phenotype_pairwise_distance, 10000))),  
  fit_pheno_10 =     condinformation(discretize(elite_trait_avg), # TE pheno -> fit, lag 10
                                     discretize(lag(phen_num_taxa, 1)), 
                                     discretize(lag(elite_trait_avg, 1))),
  fit_pheno_100 =    condinformation(discretize(elite_trait_avg),  # TE pheno -> fit, lag 100 
                                     discretize(lag(phen_num_taxa, 10)), 
                                     discretize(lag(elite_trait_avg, 10))),
  fit_pheno_1000 =   condinformation(discretize(elite_trait_avg),  # TE pheno -> fit, lag 1000
                                     discretize(lag(phen_num_taxa, 100)), 
                                     discretize(lag(elite_trait_avg, 100))),
  fit_pheno_10000 =  condinformation(discretize(elite_trait_avg),  # TE pheno -> fit, lag 10000
                                     discretize(lag(phen_num_taxa, 1000)), 
                                     discretize(lag(elite_trait_avg, 1000))),
  fit_pheno_100000 = condinformation(discretize(elite_trait_avg),  # TE pheno -> fit, lag 100000
                                     discretize(lag(phen_num_taxa, 10000)), 
                                     discretize(lag(elite_trait_avg, 10000)))
)

# Turn Transfer Entropy columns into rows
res <- res %>% pivot_longer(cols=contains("o_10"))
# Pull lag into a column
res$offset <- str_extract(res$name, "[:digit:]*$")
# Make column indicating direction of transfer entropy
res$Type <- case_when(str_detect(res$name, "phylo") ~ "Phylogenetic", TRUE ~ "Phenotypic")

# Plot transfer entropy
ggplot(
  res %>% filter(str_detect(name, "fit_ph*")), 
  aes(
    x=as.factor(offset), 
    y=value, 
    color=Type
    )
  ) + 
  geom_boxplot() + 
  facet_wrap(~selection_name) + 
  scale_x_discrete("Lag",labels=c("10","","1000","","100000")) + 
  scale_y_continuous("Transfer Entropy") + 
  theme(legend.position = c(1, 0),
        legend.justification = c(1, 0)) + 
  scale_color_discrete("")

Next we calculate statistics to quantify these differences

# Determine which conditions are significantly different from each other
transfer_entropy_stats(res)
selection_name offset .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif label effsize magnitude
EcoEA 10 value Phenotypic Phylogenetic 50 50 463 1.00e-07 0.0000015 **** p < 1e-04 0.5425436 large
EcoEA 100 value Phenotypic Phylogenetic 50 50 242 0.00e+00 0.0000000 **** p < 1e-04 0.6948970 large
EcoEA 1000 value Phenotypic Phylogenetic 50 50 195 0.00e+00 0.0000000 **** p < 1e-04 0.7272980 large
EcoEA 10000 value Phenotypic Phylogenetic 50 50 163 0.00e+00 0.0000000 **** p < 1e-04 0.7493582 large
EcoEA 100000 value Phenotypic Phylogenetic 50 50 247 0.00e+00 0.0000000 **** p < 1e-04 0.6914501 large
Fitness Sharing 10 value Phenotypic Phylogenetic 50 50 699 1.48e-04 0.0037000 ** p = 0.0037 0.3798495 moderate
Fitness Sharing 100 value Phenotypic Phylogenetic 50 50 739 4.33e-04 0.0108250
p = 0.010825 0.3522742 moderate
Fitness Sharing 1000 value Phenotypic Phylogenetic 50 50 751 5.89e-04 0.0147250
p = 0.014725 0.3440016 moderate
Fitness Sharing 10000 value Phenotypic Phylogenetic 50 50 598 7.10e-06 0.0001770 *** p = 0.000177 0.4494771 moderate
Fitness Sharing 100000 value Phenotypic Phylogenetic 50 50 415 0.00e+00 0.0000002 **** p < 1e-04 0.5756340 large
Lexicase 10 value Phenotypic Phylogenetic 50 50 194 0.00e+00 0.0000000 **** p < 1e-04 0.7279874 large
Lexicase 100 value Phenotypic Phylogenetic 50 50 183 0.00e+00 0.0000000 **** p < 1e-04 0.7355706 large
Lexicase 1000 value Phenotypic Phylogenetic 50 50 36 0.00e+00 0.0000000 **** p < 1e-04 0.8369097 large
Lexicase 10000 value Phenotypic Phylogenetic 50 50 1 0.00e+00 0.0000000 **** p < 1e-04 0.8610381 large
Lexicase 100000 value Phenotypic Phylogenetic 50 50 0 0.00e+00 0.0000000 **** p < 1e-04 0.8617275 large
Random 10 value Phenotypic Phylogenetic 50 50 0 0.00e+00 0.0000000 **** p < 1e-04 0.8617275 large
Random 100 value Phenotypic Phylogenetic 50 50 0 0.00e+00 0.0000000 **** p < 1e-04 0.8617275 large
Random 1000 value Phenotypic Phylogenetic 50 50 0 0.00e+00 0.0000000 **** p < 1e-04 0.8617275 large
Random 10000 value Phenotypic Phylogenetic 50 50 0 0.00e+00 0.0000000 **** p < 1e-04 0.8617275 large
Random 100000 value Phenotypic Phylogenetic 50 50 249 0.00e+00 0.0000000 **** p < 1e-04 0.6900714 large
Tournament 10 value Phenotypic Phylogenetic 50 50 1798 1.60e-04 0.0040000 ** p = 0.004 0.3777813 moderate
Tournament 100 value Phenotypic Phylogenetic 50 50 2049 0.00e+00 0.0000009 **** p < 1e-04 0.5508162 large
Tournament 1000 value Phenotypic Phylogenetic 50 50 2232 0.00e+00 0.0000000 **** p < 1e-04 0.6769731 large
Tournament 10000 value Phenotypic Phylogenetic 50 50 2329 0.00e+00 0.0000000 **** p < 1e-04 0.7438432 large
Tournament 100000 value Phenotypic Phylogenetic 50 50 1418 2.48e-01 1.0000000 ns p = 1 0.1158162 small

Mean pairwise distance vs. phenotypic richness

# Calculate transfer entropy for mean pairwise distance
# Time points are 10 generations, so calculating lag 1 gives us lag 10
res <- data %>% group_by(directory, selection_name) %>%
summarise(
  fit_phylo_10 =     condinformation(discretize(elite_trait_avg), # TE phylo -> fit, lag 10
                                     discretize(lag(mean_phenotype_pairwise_distance, 1)), 
                                     discretize(lag(elite_trait_avg, 1))),
  fit_phylo_100 =    condinformation(discretize(elite_trait_avg), # TE phylo -> fit, lag 100
                                     discretize(lag(mean_phenotype_pairwise_distance, 10)), 
                                     discretize(lag(elite_trait_avg, 10))),
  fit_phylo_1000 =   condinformation(discretize(elite_trait_avg), # TE phylo -> fit, lag 1000
                                     discretize(lag(mean_phenotype_pairwise_distance, 100)), 
                                     discretize(lag(elite_trait_avg, 100))),
  fit_phylo_10000 =  condinformation(discretize(elite_trait_avg), # TE phylo -> fit, lag 10000
                                     discretize(lag(mean_phenotype_pairwise_distance, 1000)), 
                                     discretize(lag(elite_trait_avg, 1000))),
  fit_phylo_100000 = condinformation(discretize(elite_trait_avg), # TE phylo -> fit, lag 100000
                                     discretize(lag(mean_phenotype_pairwise_distance, 10000)), 
                                     discretize(lag(elite_trait_avg, 10000))),  
  fit_fit_10000 =    condinformation(discretize(elite_trait_avg), # Mutual info btwn. fit and lagged fit
                                     discretize(lag(elite_trait_avg, 1000)), 
                                     discretize(lag(mean_phenotype_pairwise_distance, 1000))),
  fit_fit_100000 =   condinformation(discretize(elite_trait_avg), # Mutual info btwn. fit and lagged fit
                                     discretize(lag(elite_trait_avg, 10000)), 
                                     discretize(lag(mean_phenotype_pairwise_distance, 10000))),  
  fit_pheno_10 =     condinformation(discretize(elite_trait_avg), # TE pheno -> fit, lag 10
                                     discretize(lag(phen_num_taxa, 1)), 
                                     discretize(lag(elite_trait_avg, 1))),
  fit_pheno_100 =    condinformation(discretize(elite_trait_avg),  # TE pheno -> fit, lag 100 
                                     discretize(lag(phen_num_taxa, 10)), 
                                     discretize(lag(elite_trait_avg, 10))),
  fit_pheno_1000 =   condinformation(discretize(elite_trait_avg),  # TE pheno -> fit, lag 1000
                                     discretize(lag(phen_num_taxa, 100)), 
                                     discretize(lag(elite_trait_avg, 100))),
  fit_pheno_10000 =  condinformation(discretize(elite_trait_avg),  # TE pheno -> fit, lag 10000
                                     discretize(lag(phen_num_taxa, 1000)), 
                                     discretize(lag(elite_trait_avg, 1000))),
  fit_pheno_100000 = condinformation(discretize(elite_trait_avg),  # TE pheno -> fit, lag 100000
                                     discretize(lag(phen_num_taxa, 10000)), 
                                     discretize(lag(elite_trait_avg, 10000)))
)

# Turn Transfer Entropy columns into rows
res <- res %>% pivot_longer(cols=contains("o_10"))
# Pull lag into a column
res$offset <- str_extract(res$name, "[:digit:]*$")
# Make column indicating direction of transfer entropy
res$Type <- case_when(str_detect(res$name, "phylo") ~ "Phylogenetic", TRUE ~ "Phenotypic")

# Plot transfer entropy
ggplot(
  res %>% filter(str_detect(name, "fit_ph*")), 
  aes(
    x=as.factor(offset), 
    y=value, 
    color=Type
    )
  ) + 
  geom_boxplot() + 
  facet_wrap(~selection_name) + 
  scale_x_discrete("Lag",labels=c("10","","1000","","100000")) + 
  scale_y_continuous("Transfer Entropy") + 
  theme(legend.position = c(1, 0),
        legend.justification = c(1, 0)) + 
  scale_color_discrete("")

Mean and max pairwise distance appear to behave virtually identically.

# Determine which conditions are significantly different from each other
transfer_entropy_stats(res)
selection_name offset .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif label effsize magnitude
EcoEA 10 value Phenotypic Phylogenetic 50 50 292 0.00e+00 0.0000000 **** p < 1e-04 0.6604279 large
EcoEA 100 value Phenotypic Phylogenetic 50 50 251 0.00e+00 0.0000000 **** p < 1e-04 0.6886926 large
EcoEA 1000 value Phenotypic Phylogenetic 50 50 318 0.00e+00 0.0000000 **** p < 1e-04 0.6425040 large
EcoEA 10000 value Phenotypic Phylogenetic 50 50 211 0.00e+00 0.0000000 **** p < 1e-04 0.7162679 large
EcoEA 100000 value Phenotypic Phylogenetic 50 50 374 0.00e+00 0.0000000 **** p < 1e-04 0.6038986 large
Fitness Sharing 10 value Phenotypic Phylogenetic 50 50 522 5.00e-07 0.0000132 **** p < 1e-04 0.5018701 large
Fitness Sharing 100 value Phenotypic Phylogenetic 50 50 550 1.40e-06 0.0000355 **** p < 1e-04 0.4825674 moderate
Fitness Sharing 1000 value Phenotypic Phylogenetic 50 50 620 1.43e-05 0.0003575 *** p = 0.0003575 0.4343107 moderate
Fitness Sharing 10000 value Phenotypic Phylogenetic 50 50 464 1.00e-07 0.0000015 **** p < 1e-04 0.5418542 large
Fitness Sharing 100000 value Phenotypic Phylogenetic 50 50 345 0.00e+00 0.0000000 **** p < 1e-04 0.6238907 large
Lexicase 10 value Phenotypic Phylogenetic 50 50 193 0.00e+00 0.0000000 **** p < 1e-04 0.7286768 large
Lexicase 100 value Phenotypic Phylogenetic 50 50 222 0.00e+00 0.0000000 **** p < 1e-04 0.7086847 large
Lexicase 1000 value Phenotypic Phylogenetic 50 50 336 0.00e+00 0.0000000 **** p < 1e-04 0.6300951 large
Lexicase 10000 value Phenotypic Phylogenetic 50 50 137 0.00e+00 0.0000000 **** p < 1e-04 0.7672822 large
Lexicase 100000 value Phenotypic Phylogenetic 50 50 140 0.00e+00 0.0000000 **** p < 1e-04 0.7652140 large
Random 10 value Phenotypic Phylogenetic 50 50 0 0.00e+00 0.0000000 **** p < 1e-04 0.8617275 large
Random 100 value Phenotypic Phylogenetic 50 50 0 0.00e+00 0.0000000 **** p < 1e-04 0.8617275 large
Random 1000 value Phenotypic Phylogenetic 50 50 0 0.00e+00 0.0000000 **** p < 1e-04 0.8617275 large
Random 10000 value Phenotypic Phylogenetic 50 50 0 0.00e+00 0.0000000 **** p < 1e-04 0.8617275 large
Random 100000 value Phenotypic Phylogenetic 50 50 504 3.00e-07 0.0000069 **** p < 1e-04 0.5142790 large
Tournament 10 value Phenotypic Phylogenetic 50 50 932 2.86e-02 0.7150000 ns p = 0.715 0.2192235 small
Tournament 100 value Phenotypic Phylogenetic 50 50 1106 3.23e-01 1.0000000 ns p = 1 0.0992710 small
Tournament 1000 value Phenotypic Phylogenetic 50 50 1209 7.80e-01 1.0000000 ns p = 1 0.0282647 small
Tournament 10000 value Phenotypic Phylogenetic 50 50 1184 6.52e-01 1.0000000 ns p = 1 0.0454992 small
Tournament 100000 value Phenotypic Phylogenetic 50 50 612 1.11e-05 0.0002775 *** p = 0.0002775 0.4398257 moderate

Mean pairwise distance vs. shannon entropy

# Calculate transfer entropy for mean pairwise distance
# Time points are 10 generations, so calculating lag 1 gives us lag 10
res <- data %>% group_by(directory, selection_name) %>%
summarise(
  fit_phylo_10 =     condinformation(discretize(elite_trait_avg), # TE phylo -> fit, lag 10
                                     discretize(lag(mean_phenotype_pairwise_distance, 1)), 
                                     discretize(lag(elite_trait_avg, 1))),
  fit_phylo_100 =    condinformation(discretize(elite_trait_avg), # TE phylo -> fit, lag 100
                                     discretize(lag(mean_phenotype_pairwise_distance, 10)), 
                                     discretize(lag(elite_trait_avg, 10))),
  fit_phylo_1000 =   condinformation(discretize(elite_trait_avg), # TE phylo -> fit, lag 1000
                                     discretize(lag(mean_phenotype_pairwise_distance, 100)), 
                                     discretize(lag(elite_trait_avg, 100))),
  fit_phylo_10000 =  condinformation(discretize(elite_trait_avg), # TE phylo -> fit, lag 10000
                                     discretize(lag(mean_phenotype_pairwise_distance, 1000)), 
                                     discretize(lag(elite_trait_avg, 1000))),
  fit_phylo_100000 = condinformation(discretize(elite_trait_avg), # TE phylo -> fit, lag 100000
                                     discretize(lag(mean_phenotype_pairwise_distance, 10000)), 
                                     discretize(lag(elite_trait_avg, 10000))),  
  fit_fit_10000 =    condinformation(discretize(elite_trait_avg), # Mutual info btwn. fit and lagged fit
                                     discretize(lag(elite_trait_avg, 1000)), 
                                     discretize(lag(mean_phenotype_pairwise_distance, 1000))),
  fit_fit_100000 =   condinformation(discretize(elite_trait_avg), # Mutual info btwn. fit and lagged fit
                                     discretize(lag(elite_trait_avg, 10000)), 
                                     discretize(lag(mean_phenotype_pairwise_distance, 10000))),  
  fit_pheno_10 =     condinformation(discretize(elite_trait_avg), # TE pheno -> fit, lag 10
                                     discretize(lag(phen_diversity, 1)), 
                                     discretize(lag(elite_trait_avg, 1))),
  fit_pheno_100 =    condinformation(discretize(elite_trait_avg),  # TE pheno -> fit, lag 100 
                                     discretize(lag(phen_diversity, 10)), 
                                     discretize(lag(elite_trait_avg, 10))),
  fit_pheno_1000 =   condinformation(discretize(elite_trait_avg),  # TE pheno -> fit, lag 1000
                                     discretize(lag(phen_diversity, 100)), 
                                     discretize(lag(elite_trait_avg, 100))),
  fit_pheno_10000 =  condinformation(discretize(elite_trait_avg),  # TE pheno -> fit, lag 10000
                                     discretize(lag(phen_diversity, 1000)), 
                                     discretize(lag(elite_trait_avg, 1000))),
  fit_pheno_100000 = condinformation(discretize(elite_trait_avg),  # TE pheno -> fit, lag 100000
                                     discretize(lag(phen_diversity, 10000)), 
                                     discretize(lag(elite_trait_avg, 10000)))
)

# Turn Transfer Entropy columns into rows
res <- res %>% pivot_longer(cols=contains("o_10"))
# Pull lag into a column
res$offset <- str_extract(res$name, "[:digit:]*$")
# Make column indicating direction of transfer entropy
res$Type <- case_when(str_detect(res$name, "phylo") ~ "Phylogenetic", TRUE ~ "Phenotypic")

# Plot transfer entropy
ggplot(
  res %>% filter(str_detect(name, "fit_ph*")), 
  aes(
    x=as.factor(offset), 
    y=value, 
    color=Type
    )
  ) + 
  geom_boxplot() + 
  facet_wrap(~selection_name) + 
  scale_x_discrete("Lag",labels=c("10","","1000","","100000")) + 
  scale_y_continuous("Transfer Entropy") + 
  theme(legend.position = c(1, 0),
        legend.justification = c(1, 0)) + 
  scale_color_discrete("")

Looks like Shannon Diversity is more predictive of fitness than phenotypic richness (at least for lexicase), although still not as much as mean pairwise distance.

# Determine which conditions are significantly different from each other
transfer_entropy_stats(res)
selection_name offset .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif label effsize magnitude
EcoEA 10 value Phenotypic Phylogenetic 50 50 789 1.50e-03 0.0375000
p = 0.0375 0.3178051 moderate
EcoEA 100 value Phenotypic Phylogenetic 50 50 671 6.66e-05 0.0016650 ** p = 0.001665 0.3991522 moderate
EcoEA 1000 value Phenotypic Phylogenetic 50 50 664 5.43e-05 0.0013575 ** p = 0.0013575 0.4039778 moderate
EcoEA 10000 value Phenotypic Phylogenetic 50 50 293 0.00e+00 0.0000000 **** p < 1e-04 0.6597386 large
EcoEA 100000 value Phenotypic Phylogenetic 50 50 429 0.00e+00 0.0000004 **** p < 1e-04 0.5659826 large
Fitness Sharing 10 value Phenotypic Phylogenetic 50 50 705 1.74e-04 0.0043500 ** p = 0.00435 0.3757132 moderate
Fitness Sharing 100 value Phenotypic Phylogenetic 50 50 725 2.99e-04 0.0074750 ** p = 0.007475 0.3619255 moderate
Fitness Sharing 1000 value Phenotypic Phylogenetic 50 50 782 1.27e-03 0.0317500
p = 0.03175 0.3226308 moderate
Fitness Sharing 10000 value Phenotypic Phylogenetic 50 50 612 1.11e-05 0.0002775 *** p = 0.0002775 0.4398257 moderate
Fitness Sharing 100000 value Phenotypic Phylogenetic 50 50 566 2.40e-06 0.0000612 **** p < 1e-04 0.4715373 moderate
Lexicase 10 value Phenotypic Phylogenetic 50 50 560 2.00e-06 0.0000500 **** p < 1e-04 0.4756736 moderate
Lexicase 100 value Phenotypic Phylogenetic 50 50 608 9.80e-06 0.0002440 *** p = 0.000244 0.4425832 moderate
Lexicase 1000 value Phenotypic Phylogenetic 50 50 827 3.58e-03 0.0895000 ns p = 0.0895 0.2916086 small
Lexicase 10000 value Phenotypic Phylogenetic 50 50 608 9.80e-06 0.0002440 *** p = 0.000244 0.4425832 moderate
Lexicase 100000 value Phenotypic Phylogenetic 50 50 636 2.34e-05 0.0005850 *** p = 0.000585 0.4232805 moderate
Random 10 value Phenotypic Phylogenetic 50 50 402 0.00e+00 0.0000001 **** p < 1e-04 0.5845959 large
Random 100 value Phenotypic Phylogenetic 50 50 1 0.00e+00 0.0000000 **** p < 1e-04 0.8610381 large
Random 1000 value Phenotypic Phylogenetic 50 50 0 0.00e+00 0.0000000 **** p < 1e-04 0.8617275 large
Random 10000 value Phenotypic Phylogenetic 50 50 0 0.00e+00 0.0000000 **** p < 1e-04 0.8617275 large
Random 100000 value Phenotypic Phylogenetic 50 50 428 0.00e+00 0.0000004 **** p < 1e-04 0.5666720 large
Tournament 10 value Phenotypic Phylogenetic 50 50 1244 9.70e-01 1.0000000 ns p = 1 0.0041363 small
Tournament 100 value Phenotypic Phylogenetic 50 50 1220 8.39e-01 1.0000000 ns p = 1 0.0206815 small
Tournament 1000 value Phenotypic Phylogenetic 50 50 1280 8.39e-01 1.0000000 ns p = 1 0.0206815 small
Tournament 10000 value Phenotypic Phylogenetic 50 50 1250 1.00e+00 1.0000000 ns p = 1 0.0000000 small
Tournament 100000 value Phenotypic Phylogenetic 50 50 1009 9.73e-02 1.0000000 ns p = 1 0.1661411 small

Mean evolutionary distinctiveness vs. phenotypic richness

# Calculate transfer entropy for mean evolutionary distinctiveness
# Time points are 10 generations, so calculating lag 1 gives us lag 10
res <- data %>% group_by(directory, selection_name) %>%
summarise(
  fit_phylo_10 =     condinformation(discretize(elite_trait_avg), # TE phylo -> fit, lag 10
                                     discretize(lag(mean_phenotype_evolutionary_distinctiveness, 1)), 
                                     discretize(lag(elite_trait_avg, 1))),
  fit_phylo_100 =    condinformation(discretize(elite_trait_avg), # TE phylo -> fit, lag 100
                                     discretize(lag(mean_phenotype_evolutionary_distinctiveness, 10)), 
                                     discretize(lag(elite_trait_avg, 10))),
  fit_phylo_1000 =   condinformation(discretize(elite_trait_avg), # TE phylo -> fit, lag 1000
                                     discretize(lag(mean_phenotype_evolutionary_distinctiveness, 100)), 
                                     discretize(lag(elite_trait_avg, 100))),
  fit_phylo_10000 =  condinformation(discretize(elite_trait_avg), # TE phylo -> fit, lag 10000
                                     discretize(lag(mean_phenotype_evolutionary_distinctiveness, 1000)), 
                                     discretize(lag(elite_trait_avg, 1000))),
  fit_phylo_100000 = condinformation(discretize(elite_trait_avg), # TE phylo -> fit, lag 100000
                                     discretize(lag(mean_phenotype_evolutionary_distinctiveness, 10000)), 
                                     discretize(lag(elite_trait_avg, 10000))),  
  fit_fit_10000 =    condinformation(discretize(elite_trait_avg), # Mutual info btwn. fit and lagged fit
                                     discretize(lag(elite_trait_avg, 1000)), 
                                     discretize(lag(mean_phenotype_evolutionary_distinctiveness, 1000))),
  fit_fit_100000 =   condinformation(discretize(elite_trait_avg), # Mutual info btwn. fit and lagged fit
                                     discretize(lag(elite_trait_avg, 10000)), 
                                     discretize(lag(mean_phenotype_evolutionary_distinctiveness, 10000))),  
  fit_pheno_10 =     condinformation(discretize(elite_trait_avg), # TE pheno -> fit, lag 10
                                     discretize(lag(phen_num_taxa, 1)), 
                                     discretize(lag(elite_trait_avg, 1))),
  fit_pheno_100 =    condinformation(discretize(elite_trait_avg),  # TE pheno -> fit, lag 100 
                                     discretize(lag(phen_num_taxa, 10)), 
                                     discretize(lag(elite_trait_avg, 10))),
  fit_pheno_1000 =   condinformation(discretize(elite_trait_avg),  # TE pheno -> fit, lag 1000
                                     discretize(lag(phen_num_taxa, 100)), 
                                     discretize(lag(elite_trait_avg, 100))),
  fit_pheno_10000 =  condinformation(discretize(elite_trait_avg),  # TE pheno -> fit, lag 10000
                                     discretize(lag(phen_num_taxa, 1000)), 
                                     discretize(lag(elite_trait_avg, 1000))),
  fit_pheno_100000 = condinformation(discretize(elite_trait_avg),  # TE pheno -> fit, lag 100000
                                     discretize(lag(phen_num_taxa, 10000)), 
                                     discretize(lag(elite_trait_avg, 10000)))
)

# Turn Transfer Entropy columns into rows
res <- res %>% pivot_longer(cols=contains("o_10"))
# Pull lag into a column
res$offset <- str_extract(res$name, "[:digit:]*$")
# Make column indicating direction of transfer entropy
res$Type <- case_when(str_detect(res$name, "phylo") ~ "Phylogenetic", TRUE ~ "Phenotypic")

# Plot transfer entropy
ggplot(
  res %>% filter(str_detect(name, "fit_ph*")), 
  aes(
    x=as.factor(offset), 
    y=value, 
    color=Type
    )
  ) + 
  geom_boxplot() + 
  facet_wrap(~selection_name) + 
  scale_x_discrete("Lag",labels=c("10","","1000","","100000")) + 
  scale_y_continuous("Transfer Entropy") + 
  theme(legend.position = c(1, 0),
        legend.justification = c(1, 0)) + 
  scale_color_discrete("")

Interestingly, mean evolutionary distinctiveness looks fairly comparable to mean pairwise distance (although note that the y axis doesn’t go quite as far).

# Determine which conditions are significantly different from each other
transfer_entropy_stats(res)
selection_name offset .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif label effsize magnitude
EcoEA 10 value Phenotypic Phylogenetic 50 50 543 1.10e-06 0.0000278 **** p < 1e-04 0.4873931 moderate
EcoEA 100 value Phenotypic Phylogenetic 50 50 540 1.00e-06 0.0000250 **** p < 1e-04 0.4894612 moderate
EcoEA 1000 value Phenotypic Phylogenetic 50 50 629 1.89e-05 0.0004725 *** p = 0.0004725 0.4281062 moderate
EcoEA 10000 value Phenotypic Phylogenetic 50 50 381 0.00e+00 0.0000001 **** p < 1e-04 0.5990729 large
EcoEA 100000 value Phenotypic Phylogenetic 50 50 462 1.00e-07 0.0000014 **** p < 1e-04 0.5432330 large
Fitness Sharing 10 value Phenotypic Phylogenetic 50 50 529 7.00e-07 0.0000170 **** p < 1e-04 0.4970444 moderate
Fitness Sharing 100 value Phenotypic Phylogenetic 50 50 553 1.60e-06 0.0000392 **** p < 1e-04 0.4804992 moderate
Fitness Sharing 1000 value Phenotypic Phylogenetic 50 50 428 0.00e+00 0.0000004 **** p < 1e-04 0.5666720 large
Fitness Sharing 10000 value Phenotypic Phylogenetic 50 50 330 0.00e+00 0.0000000 **** p < 1e-04 0.6342314 large
Fitness Sharing 100000 value Phenotypic Phylogenetic 50 50 146 0.00e+00 0.0000000 **** p < 1e-04 0.7610777 large
Lexicase 10 value Phenotypic Phylogenetic 50 50 237 0.00e+00 0.0000000 **** p < 1e-04 0.6983440 large
Lexicase 100 value Phenotypic Phylogenetic 50 50 272 0.00e+00 0.0000000 **** p < 1e-04 0.6742156 large
Lexicase 1000 value Phenotypic Phylogenetic 50 50 509 3.00e-07 0.0000083 **** p < 1e-04 0.5108321 large
Lexicase 10000 value Phenotypic Phylogenetic 50 50 208 0.00e+00 0.0000000 **** p < 1e-04 0.7183360 large
Lexicase 100000 value Phenotypic Phylogenetic 50 50 230 0.00e+00 0.0000000 **** p < 1e-04 0.7031696 large
Random 10 value Phenotypic Phylogenetic 50 50 0 0.00e+00 0.0000000 **** p < 1e-04 0.8617275 large
Random 100 value Phenotypic Phylogenetic 50 50 0 0.00e+00 0.0000000 **** p < 1e-04 0.8617275 large
Random 1000 value Phenotypic Phylogenetic 50 50 0 0.00e+00 0.0000000 **** p < 1e-04 0.8617275 large
Random 10000 value Phenotypic Phylogenetic 50 50 2 0.00e+00 0.0000000 **** p < 1e-04 0.8603487 large
Random 100000 value Phenotypic Phylogenetic 50 50 965 4.98e-02 1.0000000 ns p = 1 0.1964739 small
Tournament 10 value Phenotypic Phylogenetic 50 50 655 4.16e-05 0.0010400 ** p = 0.00104 0.4101823 moderate
Tournament 100 value Phenotypic Phylogenetic 50 50 716 2.35e-04 0.0058750 ** p = 0.005875 0.3681300 moderate
Tournament 1000 value Phenotypic Phylogenetic 50 50 736 4.00e-04 0.0100000 ** p = 0.01 0.3543423 moderate
Tournament 10000 value Phenotypic Phylogenetic 50 50 187 0.00e+00 0.0000000 **** p < 1e-04 0.7328131 large
Tournament 100000 value Phenotypic Phylogenetic 50 50 0 0.00e+00 0.0000000 **** p < 1e-04 0.8617275 large

Mean evolutionary distinctiveness vs. shannon entropy

# Calculate transfer entropy for mean evolutionary distinctiveness
# Time points are 10 generations, so calculating lag 1 gives us lag 10
res <- data %>% group_by(directory, selection_name) %>%
summarise(
  fit_phylo_10 =     condinformation(discretize(elite_trait_avg), # TE phylo -> fit, lag 10
                                     discretize(lag(mean_phenotype_evolutionary_distinctiveness, 1)), 
                                     discretize(lag(elite_trait_avg, 1))),
  fit_phylo_100 =    condinformation(discretize(elite_trait_avg), # TE phylo -> fit, lag 100
                                     discretize(lag(mean_phenotype_evolutionary_distinctiveness, 10)), 
                                     discretize(lag(elite_trait_avg, 10))),
  fit_phylo_1000 =   condinformation(discretize(elite_trait_avg), # TE phylo -> fit, lag 1000
                                     discretize(lag(mean_phenotype_evolutionary_distinctiveness, 100)), 
                                     discretize(lag(elite_trait_avg, 100))),
  fit_phylo_10000 =  condinformation(discretize(elite_trait_avg), # TE phylo -> fit, lag 10000
                                     discretize(lag(mean_phenotype_evolutionary_distinctiveness, 1000)), 
                                     discretize(lag(elite_trait_avg, 1000))),
  fit_phylo_100000 = condinformation(discretize(elite_trait_avg), # TE phylo -> fit, lag 100000
                                     discretize(lag(mean_phenotype_evolutionary_distinctiveness, 10000)), 
                                     discretize(lag(elite_trait_avg, 10000))),  
  fit_fit_10000 =    condinformation(discretize(elite_trait_avg), # Mutual info btwn. fit and lagged fit
                                     discretize(lag(elite_trait_avg, 1000)), 
                                     discretize(lag(mean_phenotype_evolutionary_distinctiveness, 1000))),
  fit_fit_100000 =   condinformation(discretize(elite_trait_avg), # Mutual info btwn. fit and lagged fit
                                     discretize(lag(elite_trait_avg, 10000)), 
                                     discretize(lag(mean_phenotype_evolutionary_distinctiveness, 10000))),  
  fit_pheno_10 =     condinformation(discretize(elite_trait_avg), # TE pheno -> fit, lag 10
                                     discretize(lag(phen_diversity, 1)), 
                                     discretize(lag(elite_trait_avg, 1))),
  fit_pheno_100 =    condinformation(discretize(elite_trait_avg),  # TE pheno -> fit, lag 100 
                                     discretize(lag(phen_diversity, 10)), 
                                     discretize(lag(elite_trait_avg, 10))),
  fit_pheno_1000 =   condinformation(discretize(elite_trait_avg),  # TE pheno -> fit, lag 1000
                                     discretize(lag(phen_diversity, 100)), 
                                     discretize(lag(elite_trait_avg, 100))),
  fit_pheno_10000 =  condinformation(discretize(elite_trait_avg),  # TE pheno -> fit, lag 10000
                                     discretize(lag(phen_diversity, 1000)), 
                                     discretize(lag(elite_trait_avg, 1000))),
  fit_pheno_100000 = condinformation(discretize(elite_trait_avg),  # TE pheno -> fit, lag 100000
                                     discretize(lag(phen_diversity, 10000)), 
                                     discretize(lag(elite_trait_avg, 10000)))
)

# Turn Transfer Entropy columns into rows
res <- res %>% pivot_longer(cols=contains("o_10"))
# Pull lag into a column
res$offset <- str_extract(res$name, "[:digit:]*$")
# Make column indicating direction of transfer entropy
res$Type <- case_when(str_detect(res$name, "phylo") ~ "Phylogenetic", TRUE ~ "Phenotypic")

# Plot transfer entropy
ggplot(
  res %>% filter(str_detect(name, "fit_ph*")), 
  aes(
    x=as.factor(offset), 
    y=value, 
    color=Type
    )
  ) + 
  geom_boxplot() + 
  facet_wrap(~selection_name) + 
  scale_x_discrete("Lag",labels=c("10","","1000","","100000")) + 
  scale_y_continuous("Transfer Entropy") + 
  theme(legend.position = c(1, 0),
        legend.justification = c(1, 0)) + 
  scale_color_discrete("")

# Determine which conditions are significantly different from each other
transfer_entropy_stats(res)
selection_name offset .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif label effsize magnitude
EcoEA 10 value Phenotypic Phylogenetic 50 50 1386 3.50e-01 1.0000000 ns p = 1 0.0937560 small
EcoEA 100 value Phenotypic Phylogenetic 50 50 1371 4.06e-01 1.0000000 ns p = 1 0.0834152 small
EcoEA 1000 value Phenotypic Phylogenetic 50 50 1352 4.84e-01 1.0000000 ns p = 1 0.0703170 small
EcoEA 10000 value Phenotypic Phylogenetic 50 50 517 4.00e-07 0.0000111 **** p < 1e-04 0.5053170 large
EcoEA 100000 value Phenotypic Phylogenetic 50 50 553 1.60e-06 0.0000392 **** p < 1e-04 0.4804992 moderate
Fitness Sharing 10 value Phenotypic Phylogenetic 50 50 813 2.62e-03 0.0655000 ns p = 0.0655 0.3012599 moderate
Fitness Sharing 100 value Phenotypic Phylogenetic 50 50 839 4.66e-03 0.1165000 ns p = 0.1165 0.2833360 small
Fitness Sharing 1000 value Phenotypic Phylogenetic 50 50 624 1.62e-05 0.0004050 *** p = 0.000405 0.4315531 moderate
Fitness Sharing 10000 value Phenotypic Phylogenetic 50 50 448 0.00e+00 0.0000008 **** p < 1e-04 0.5528844 large
Fitness Sharing 100000 value Phenotypic Phylogenetic 50 50 390 0.00e+00 0.0000001 **** p < 1e-04 0.5928685 large
Lexicase 10 value Phenotypic Phylogenetic 50 50 642 2.81e-05 0.0007025 *** p = 0.0007025 0.4191442 moderate
Lexicase 100 value Phenotypic Phylogenetic 50 50 693 1.25e-04 0.0031250 ** p = 0.003125 0.3839858 moderate
Lexicase 1000 value Phenotypic Phylogenetic 50 50 1040 1.49e-01 1.0000000 ns p = 1 0.1447702 small
Lexicase 10000 value Phenotypic Phylogenetic 50 50 764 8.17e-04 0.0204250
p = 0.020425 0.3350396 moderate
Lexicase 100000 value Phenotypic Phylogenetic 50 50 715 2.29e-04 0.0057250 ** p = 0.005725 0.3688194 moderate
Random 10 value Phenotypic Phylogenetic 50 50 721 2.69e-04 0.0067250 ** p = 0.006725 0.3646831 moderate
Random 100 value Phenotypic Phylogenetic 50 50 278 0.00e+00 0.0000000 **** p < 1e-04 0.6700793 large
Random 1000 value Phenotypic Phylogenetic 50 50 89 0.00e+00 0.0000000 **** p < 1e-04 0.8003725 large
Random 10000 value Phenotypic Phylogenetic 50 50 152 0.00e+00 0.0000000 **** p < 1e-04 0.7569414 large
Random 100000 value Phenotypic Phylogenetic 50 50 881 1.11e-02 0.2775000 ns p = 0.2775 0.2543820 small
Tournament 10 value Phenotypic Phylogenetic 50 50 1014 1.04e-01 1.0000000 ns p = 1 0.1626941 small
Tournament 100 value Phenotypic Phylogenetic 50 50 861 7.40e-03 0.1850000 ns p = 0.185 0.2681696 small
Tournament 1000 value Phenotypic Phylogenetic 50 50 774 1.05e-03 0.0262500
p = 0.02625 0.3281458 moderate
Tournament 10000 value Phenotypic Phylogenetic 50 50 211 0.00e+00 0.0000000 **** p < 1e-04 0.7162679 large
Tournament 100000 value Phenotypic Phylogenetic 50 50 0 0.00e+00 0.0000000 **** p < 1e-04 0.8617275 large

Transfer entropy between types of diversity

While we’re calculating transfer entropy, we might as well also calculate it between phenotypic diversity and phylogenetic diversity, as these could potentially also be in a feedback loop.

Max pairwise distance and phenotypic richness

res <- data %>% group_by(directory, selection_name) %>%
summarise(
  phen_phylo_10 =      condinformation(discretize(phen_num_taxa), 
                                       discretize(lag(max_phenotype_pairwise_distance, 1)), 
                                       discretize(lag(phen_num_taxa, 1))),
  phen_phylo_100 =     condinformation(discretize(phen_num_taxa),
                                       discretize(lag(max_phenotype_pairwise_distance, 10)),
                                       discretize(lag(phen_num_taxa, 10))),
  pheno_phylo_1000 =   condinformation(discretize(phen_num_taxa),
                                       discretize(lag(max_phenotype_pairwise_distance, 100)),
                                       discretize(lag(phen_num_taxa, 100))),
  pheno_phylo_10000 =  condinformation(discretize(phen_num_taxa),
                                       discretize(lag(max_phenotype_pairwise_distance, 1000)),
                                       discretize(lag(phen_num_taxa, 1000))),
  pheno_phylo_100000 = condinformation(discretize(phen_num_taxa),
                                       discretize(lag(max_phenotype_pairwise_distance, 10000)),
                                       discretize(lag(phen_num_taxa, 10000))),
  
  phylo_pheno_10 =     condinformation(discretize(max_phenotype_pairwise_distance),
                                       discretize(lag(phen_num_taxa, 1)),
                                       discretize(lag(max_phenotype_pairwise_distance, 1))),
  phylo_pheno_100 =    condinformation(discretize(max_phenotype_pairwise_distance),
                                       discretize(lag(phen_num_taxa, 10)),
                                       discretize(lag(max_phenotype_pairwise_distance, 10))),
  phylo_pheno_1000 =   condinformation(discretize(max_phenotype_pairwise_distance),
                                       discretize(lag(phen_num_taxa, 100)),
                                       discretize(lag(max_phenotype_pairwise_distance, 100))),
  phylo_pheno_10000 =  condinformation(discretize(max_phenotype_pairwise_distance),
                                       discretize(lag(phen_num_taxa, 1000)),
                                       discretize(lag(max_phenotype_pairwise_distance, 1000))),
  phylo_pheno_100000 = condinformation(discretize(max_phenotype_pairwise_distance),
                                       discretize(lag(phen_num_taxa, 10000)),
                                       discretize(lag(max_phenotype_pairwise_distance, 10000)))
)

# Turn Transfer Entropy columns into rows
res <- res %>% pivot_longer(cols=contains("phylo"))
# Pull lag into a column
res$offset <- str_extract(res$name, "[:digit:]*$")
# Make column indicating direction of transfer entropy
res$Type <- case_when(str_detect(res$name, "phylo_pheno") ~ "\nPhenotypic\n\t->\nPhylogenetic\n", TRUE ~ "\nPhylogenetic\n\t->\nPhenotypic\n")

ggplot(
  res, 
  aes(
    x=as.factor(offset), 
    y=value, 
    color=Type
    )
  ) + 
  geom_boxplot() + 
  facet_wrap(~selection_name) + 
  scale_x_discrete("Lag",labels=c("10","","1000","","100000")) + 
  scale_y_continuous("Transfer Entropy") + 
  theme(legend.position = c(1, 0),
        legend.justification = c(1, 0)) + 
  scale_color_discrete("")

# Determine which conditions are significantly different from each other
transfer_entropy_stats(res)
selection_name offset .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif label effsize magnitude
EcoEA 10 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 23 0.00e+00 0.000000 **** p < 1e-04 0.8458717 large
EcoEA 100 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 36 0.00e+00 0.000000 **** p < 1e-04 0.8369097 large
EcoEA 1000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 29 0.00e+00 0.000000 **** p < 1e-04 0.8417354 large
EcoEA 10000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 36 0.00e+00 0.000000 **** p < 1e-04 0.8369097 large
EcoEA 100000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 59 0.00e+00 0.000000 **** p < 1e-04 0.8210539 large
Fitness Sharing 10 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 745 5.05e-04 0.012625
p = 0.012625 0.3481379 moderate
Fitness Sharing 100 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 747 5.32e-04 0.013300
p = 0.0133 0.3467591 moderate
Fitness Sharing 1000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 645 3.08e-05 0.000770 *** p = 0.00077 0.4170761 moderate
Fitness Sharing 10000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 700 1.52e-04 0.003800 ** p = 0.0038 0.3791601 moderate
Fitness Sharing 100000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 722 2.76e-04 0.006900 ** p = 0.0069 0.3639937 moderate
Lexicase 10 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 0 0.00e+00 0.000000 **** p < 1e-04 0.8617275 large
Lexicase 100 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 0 0.00e+00 0.000000 **** p < 1e-04 0.8617275 large
Lexicase 1000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 0 0.00e+00 0.000000 **** p < 1e-04 0.8617275 large
Lexicase 10000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 0 0.00e+00 0.000000 **** p < 1e-04 0.8617275 large
Lexicase 100000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 0 0.00e+00 0.000000 **** p < 1e-04 0.8617275 large
Random 10 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 0 0.00e+00 0.000000 **** p < 1e-04 0.8617275 large
Random 100 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 17 0.00e+00 0.000000 **** p < 1e-04 0.8500080 large
Random 1000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 2470 0.00e+00 0.000000 **** p < 1e-04 0.8410460 large
Random 10000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 2487 0.00e+00 0.000000 **** p < 1e-04 0.8527655 large
Random 100000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 1715 1.36e-03 0.034000
p = 0.034 0.3205626 moderate
Tournament 10 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 173 0.00e+00 0.000000 **** p < 1e-04 0.7424644 large
Tournament 100 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 53 0.00e+00 0.000000 **** p < 1e-04 0.8251902 large
Tournament 1000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 70 0.00e+00 0.000000 **** p < 1e-04 0.8134707 large
Tournament 10000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 53 0.00e+00 0.000000 **** p < 1e-04 0.8251902 large
Tournament 100000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 107 0.00e+00 0.000000 **** p < 1e-04 0.7879636 large

Mean pairwise distance and phenotypic richness

res <- data %>% group_by(directory, selection_name) %>%
summarise(
  phen_phylo_10 =      condinformation(discretize(phen_num_taxa), 
                                       discretize(lag(mean_phenotype_pairwise_distance, 1)), 
                                       discretize(lag(phen_num_taxa, 1))),
  phen_phylo_100 =     condinformation(discretize(phen_num_taxa),
                                       discretize(lag(mean_phenotype_pairwise_distance, 10)),
                                       discretize(lag(phen_num_taxa, 10))),
  pheno_phylo_1000 =   condinformation(discretize(phen_num_taxa),
                                       discretize(lag(mean_phenotype_pairwise_distance, 100)),
                                       discretize(lag(phen_num_taxa, 100))),
  pheno_phylo_10000 =  condinformation(discretize(phen_num_taxa),
                                       discretize(lag(mean_phenotype_pairwise_distance, 1000)),
                                       discretize(lag(phen_num_taxa, 1000))),
  pheno_phylo_100000 = condinformation(discretize(phen_num_taxa),
                                       discretize(lag(mean_phenotype_pairwise_distance, 10000)),
                                       discretize(lag(phen_num_taxa, 10000))),
  
  phylo_pheno_10 =     condinformation(discretize(mean_phenotype_pairwise_distance),
                                       discretize(lag(phen_num_taxa, 1)),
                                       discretize(lag(mean_phenotype_pairwise_distance, 1))),
  phylo_pheno_100 =    condinformation(discretize(mean_phenotype_pairwise_distance),
                                       discretize(lag(phen_num_taxa, 10)),
                                       discretize(lag(mean_phenotype_pairwise_distance, 10))),
  phylo_pheno_1000 =   condinformation(discretize(mean_phenotype_pairwise_distance),
                                       discretize(lag(phen_num_taxa, 100)),
                                       discretize(lag(mean_phenotype_pairwise_distance, 100))),
  phylo_pheno_10000 =  condinformation(discretize(mean_phenotype_pairwise_distance),
                                       discretize(lag(phen_num_taxa, 1000)),
                                       discretize(lag(mean_phenotype_pairwise_distance, 1000))),
  phylo_pheno_100000 = condinformation(discretize(mean_phenotype_pairwise_distance),
                                       discretize(lag(phen_num_taxa, 10000)),
                                       discretize(lag(mean_phenotype_pairwise_distance, 10000)))
)

# Turn Transfer Entropy columns into rows
res <- res %>% pivot_longer(cols=contains("phylo"))
# Pull lag into a column
res$offset <- str_extract(res$name, "[:digit:]*$")
# Make column indicating direction of transfer entropy
res$Type <- case_when(str_detect(res$name, "phylo_pheno") ~ "\nPhenotypic\n\t->\nPhylogenetic\n", TRUE ~ "\nPhylogenetic\n\t->\nPhenotypic\n")

ggplot(
  res, 
  aes(
    x=as.factor(offset), 
    y=value, 
    color=Type
    )
  ) + 
  geom_boxplot() + 
  facet_wrap(~selection_name) + 
  scale_x_discrete("Lag",labels=c("10","","1000","","100000")) + 
  scale_y_continuous("Transfer Entropy") + 
  theme(legend.position = c(1, 0),
        legend.justification = c(1, 0)) + 
  scale_color_discrete("")

Here, on the other hand, there’s a pretty notable difference between mean and max pairwise distance. Looks like max pairwise distance is in general more predictive of phenotypic richness than mean pairwise distance is (for fitness sharing, mean richness even has a high transfer entropy to MPD than the other way around).

# Determine which conditions are significantly different from each other
transfer_entropy_stats(res)
selection_name offset .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif label effsize magnitude
EcoEA 10 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 344 0.00e+00 0.0000000 **** p < 1e-04 0.6245801 large
EcoEA 100 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 486 1.00e-07 0.0000035 **** p < 1e-04 0.5266878 large
EcoEA 1000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 667 5.93e-05 0.0014825 ** p = 0.0014825 0.4019097 moderate
EcoEA 10000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 701 1.56e-04 0.0039000 ** p = 0.0039 0.3784707 moderate
EcoEA 100000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 731 3.51e-04 0.0087750 ** p = 0.008775 0.3577893 moderate
Fitness Sharing 10 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 1604 1.48e-02 0.3700000 ns p = 0.37 0.2440412 small
Fitness Sharing 100 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 1653 5.52e-03 0.1380000 ns p = 0.138 0.2778209 small
Fitness Sharing 1000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 1384 3.57e-01 1.0000000 ns p = 1 0.0923772 small
Fitness Sharing 10000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 1744 6.69e-04 0.0167250
p = 0.016725 0.3405547 moderate
Fitness Sharing 100000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 1567 2.91e-02 0.7275000 ns p = 0.7275 0.2185341 small
Lexicase 10 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 0 0.00e+00 0.0000000 **** p < 1e-04 0.8617275 large
Lexicase 100 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 0 0.00e+00 0.0000000 **** p < 1e-04 0.8617275 large
Lexicase 1000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 0 0.00e+00 0.0000000 **** p < 1e-04 0.8617275 large
Lexicase 10000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 0 0.00e+00 0.0000000 **** p < 1e-04 0.8617275 large
Lexicase 100000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 229 0.00e+00 0.0000000 **** p < 1e-04 0.7038590 large
Random 10 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 2 0.00e+00 0.0000000 **** p < 1e-04 0.8603487 large
Random 100 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 1806 1.28e-04 0.0032000 ** p = 0.0032 0.3832964 moderate
Random 1000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 2493 0.00e+00 0.0000000 **** p < 1e-04 0.8569018 large
Random 10000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 2499 0.00e+00 0.0000000 **** p < 1e-04 0.8610381 large
Random 100000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 1683 2.87e-03 0.0717500 ns p = 0.07175 0.2985024 small
Tournament 10 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 1297 7.49e-01 1.0000000 ns p = 1 0.0324010 small
Tournament 100 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 1309 6.87e-01 1.0000000 ns p = 1 0.0406735 small
Tournament 1000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 1298 7.43e-01 1.0000000 ns p = 1 0.0330903 small
Tournament 10000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 1295 7.59e-01 1.0000000 ns p = 1 0.0310222 small
Tournament 100000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 1277 8.55e-01 1.0000000 ns p = 1 0.0186133 small

Mean pairwise distance and shannon diversity

res <- data %>% group_by(directory, selection_name) %>%
summarise(
  phen_phylo_10 =      condinformation(discretize(phen_diversity), 
                                       discretize(lag(mean_phenotype_pairwise_distance, 1)), 
                                       discretize(lag(phen_diversity, 1))),
  phen_phylo_100 =     condinformation(discretize(phen_diversity),
                                       discretize(lag(mean_phenotype_pairwise_distance, 10)),
                                       discretize(lag(phen_diversity, 10))),
  pheno_phylo_1000 =   condinformation(discretize(phen_diversity),
                                       discretize(lag(mean_phenotype_pairwise_distance, 100)),
                                       discretize(lag(phen_diversity, 100))),
  pheno_phylo_10000 =  condinformation(discretize(phen_diversity),
                                       discretize(lag(mean_phenotype_pairwise_distance, 1000)),
                                       discretize(lag(phen_diversity, 1000))),
  pheno_phylo_100000 = condinformation(discretize(phen_diversity),
                                       discretize(lag(mean_phenotype_pairwise_distance, 10000)),
                                       discretize(lag(phen_diversity, 10000))),
  
  phylo_pheno_10 =     condinformation(discretize(mean_phenotype_pairwise_distance),
                                       discretize(lag(phen_diversity, 1)),
                                       discretize(lag(mean_phenotype_pairwise_distance, 1))),
  phylo_pheno_100 =    condinformation(discretize(mean_phenotype_pairwise_distance),
                                       discretize(lag(phen_diversity, 10)),
                                       discretize(lag(mean_phenotype_pairwise_distance, 10))),
  phylo_pheno_1000 =   condinformation(discretize(mean_phenotype_pairwise_distance),
                                       discretize(lag(phen_diversity, 100)),
                                       discretize(lag(mean_phenotype_pairwise_distance, 100))),
  phylo_pheno_10000 =  condinformation(discretize(mean_phenotype_pairwise_distance),
                                       discretize(lag(phen_diversity, 1000)),
                                       discretize(lag(mean_phenotype_pairwise_distance, 1000))),
  phylo_pheno_100000 = condinformation(discretize(mean_phenotype_pairwise_distance),
                                       discretize(lag(phen_diversity, 10000)),
                                       discretize(lag(mean_phenotype_pairwise_distance, 10000)))
)

# Turn Transfer Entropy columns into rows
res <- res %>% pivot_longer(cols=contains("phylo"))
# Pull lag into a column
res$offset <- str_extract(res$name, "[:digit:]*$")
# Make column indicating direction of transfer entropy
res$Type <- case_when(str_detect(res$name, "phylo_pheno") ~ "\nPhenotypic\n\t->\nPhylogenetic\n", TRUE ~ "\nPhylogenetic\n\t->\nPhenotypic\n")

ggplot(
  res, 
  aes(
    x=as.factor(offset), 
    y=value, 
    color=Type
    )
  ) + 
  geom_boxplot() + 
  facet_wrap(~selection_name) + 
  scale_x_discrete("Lag",labels=c("10","","1000","","100000")) + 
  scale_y_continuous("Transfer Entropy") + 
  theme(legend.position = c(1, 0),
        legend.justification = c(1, 0)) + 
  scale_color_discrete("")

# Determine which conditions are significantly different from each other
transfer_entropy_stats(res)
selection_name offset .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif label effsize magnitude
EcoEA 10 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 99 0.00000 0.0000000 **** p < 1e-04 0.7934787 large
EcoEA 100 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 175 0.00000 0.0000000 **** p < 1e-04 0.7410856 large
EcoEA 1000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 89 0.00000 0.0000000 **** p < 1e-04 0.8003725 large
EcoEA 10000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 96 0.00000 0.0000000 **** p < 1e-04 0.7955468 large
EcoEA 100000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 435 0.00000 0.0000005 **** p < 1e-04 0.5618463 large
Fitness Sharing 10 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 878 0.01040 0.2600000 ns p = 0.26 0.2564501 small
Fitness Sharing 100 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 909 0.01890 0.4725000 ns p = 0.4725 0.2350793 small
Fitness Sharing 1000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 850 0.00589 0.1472500 ns p = 0.14725 0.2757528 small
Fitness Sharing 10000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 998 0.08300 1.0000000 ns p = 1 0.1737243 small
Fitness Sharing 100000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 1250 1.00000 1.0000000 ns p = 1 0.0000000 small
Lexicase 10 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 6 0.00000 0.0000000 **** p < 1e-04 0.8575912 large
Lexicase 100 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 9 0.00000 0.0000000 **** p < 1e-04 0.8555230 large
Lexicase 1000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 36 0.00000 0.0000000 **** p < 1e-04 0.8369097 large
Lexicase 10000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 29 0.00000 0.0000000 **** p < 1e-04 0.8417354 large
Lexicase 100000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 172 0.00000 0.0000000 **** p < 1e-04 0.7431538 large
Random 10 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 0 0.00000 0.0000000 **** p < 1e-04 0.8617275 large
Random 100 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 0 0.00000 0.0000000 **** p < 1e-04 0.8617275 large
Random 1000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 913 0.02040 0.5100000 ns p = 0.51 0.2323217 small
Random 10000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 1541 0.04520 1.0000000 ns p = 1 0.2006102 small
Random 100000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 1298 0.74300 1.0000000 ns p = 1 0.0330903 small
Tournament 10 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 1235 0.92000 1.0000000 ns p = 1 0.0103407 small
Tournament 100 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 1264 0.92600 1.0000000 ns p = 1 0.0096513 small
Tournament 1000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 1260 0.94800 1.0000000 ns p = 1 0.0068938 small
Tournament 10000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 1227 0.87700 1.0000000 ns p = 1 0.0158558 small
Tournament 100000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 1235 0.92000 1.0000000 ns p = 1 0.0103407 small

Mean evolutionary distinctiveness and phenotypic richness

res <- data %>% group_by(directory, selection_name) %>%
summarise(
  phen_phylo_10 =      condinformation(
                          discretize(phen_num_taxa), 
                          discretize(lag(mean_phenotype_evolutionary_distinctiveness, 1)), 
                          discretize(lag(phen_num_taxa, 1))),
  phen_phylo_100 =     condinformation(
                          discretize(phen_num_taxa),
                          discretize(lag(mean_phenotype_evolutionary_distinctiveness, 10)),
                          discretize(lag(phen_num_taxa, 10))),
  pheno_phylo_1000 =   condinformation(
                          discretize(phen_num_taxa),
                          discretize(lag(mean_phenotype_evolutionary_distinctiveness, 100)),
                          discretize(lag(phen_num_taxa, 100))),
  pheno_phylo_10000 =  condinformation(
                          discretize(phen_num_taxa),
                          discretize(lag(mean_phenotype_evolutionary_distinctiveness, 1000)),
                          discretize(lag(phen_num_taxa, 1000))),
  pheno_phylo_100000 = condinformation(
                          discretize(phen_num_taxa),
                          discretize(lag(mean_phenotype_evolutionary_distinctiveness, 10000)),
                          discretize(lag(phen_num_taxa, 10000))),
  
  phylo_pheno_10 =     condinformation(
                          discretize(mean_phenotype_evolutionary_distinctiveness),
                          discretize(lag(phen_num_taxa, 1)),
                          discretize(lag(mean_phenotype_evolutionary_distinctiveness, 1))),
  phylo_pheno_100 =    condinformation(
                          discretize(mean_phenotype_evolutionary_distinctiveness),
                          discretize(lag(phen_num_taxa, 10)),
                          discretize(lag(mean_phenotype_evolutionary_distinctiveness, 10))),
  phylo_pheno_1000 =   condinformation(
                          discretize(mean_phenotype_evolutionary_distinctiveness),
                          discretize(lag(phen_num_taxa, 100)),
                          discretize(lag(mean_phenotype_evolutionary_distinctiveness, 100))),
  phylo_pheno_10000 =  condinformation(
                          discretize(mean_phenotype_evolutionary_distinctiveness),
                          discretize(lag(phen_num_taxa, 1000)),
                          discretize(lag(mean_phenotype_evolutionary_distinctiveness, 1000))),
  phylo_pheno_100000 = condinformation(
                          discretize(mean_phenotype_evolutionary_distinctiveness),
                          discretize(lag(phen_num_taxa, 10000)),
                          discretize(lag(mean_phenotype_evolutionary_distinctiveness, 10000)))
)

# Turn Transfer Entropy columns into rows
res <- res %>% pivot_longer(cols=contains("phylo"))
# Pull lag into a column
res$offset <- str_extract(res$name, "[:digit:]*$")
# Make column indicating direction of transfer entropy
res$Type <- case_when(str_detect(res$name, "phylo_pheno") ~ "\nPhenotypic\n\t->\nPhylogenetic\n", TRUE ~ "\nPhylogenetic\n\t->\nPhenotypic\n")

ggplot(
  res, 
  aes(
    x=as.factor(offset), 
    y=value, 
    color=Type
    )
  ) + 
  geom_boxplot() + 
  facet_wrap(~selection_name) + 
  scale_x_discrete("Lag",labels=c("10","","1000","","100000")) + 
  scale_y_continuous("Transfer Entropy") + 
  theme(legend.position = c(1, 0),
        legend.justification = c(1, 0)) + 
  scale_color_discrete("")

# Determine which conditions are significantly different from each other
transfer_entropy_stats(res)
selection_name offset .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif label effsize magnitude
EcoEA 10 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 25 0.000000 0.0000000 **** p < 1e-04 0.8444929 large
EcoEA 100 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 97 0.000000 0.0000000 **** p < 1e-04 0.7948574 large
EcoEA 1000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 172 0.000000 0.0000000 **** p < 1e-04 0.7431538 large
EcoEA 10000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 199 0.000000 0.0000000 **** p < 1e-04 0.7245405 large
EcoEA 100000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 448 0.000000 0.0000008 **** p < 1e-04 0.5528844 large
Fitness Sharing 10 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 817 0.002870 0.0717500 ns p = 0.07175 0.2985024 small
Fitness Sharing 100 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 947 0.037000 0.9250000 ns p = 0.925 0.2088827 small
Fitness Sharing 1000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 1321 0.627000 1.0000000 ns p = 1 0.0489461 small
Fitness Sharing 10000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 1814 0.000102 0.0025500 ** p = 0.00255 0.3888114 moderate
Fitness Sharing 100000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 1606 0.014300 0.3575000 ns p = 0.3575 0.2454200 small
Lexicase 10 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 0 0.000000 0.0000000 **** p < 1e-04 0.8617275 large
Lexicase 100 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 0 0.000000 0.0000000 **** p < 1e-04 0.8617275 large
Lexicase 1000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 0 0.000000 0.0000000 **** p < 1e-04 0.8617275 large
Lexicase 10000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 0 0.000000 0.0000000 **** p < 1e-04 0.8617275 large
Lexicase 100000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 13 0.000000 0.0000000 **** p < 1e-04 0.8527655 large
Random 10 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 1173 0.598000 1.0000000 ns p = 1 0.0530824 small
Random 100 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 2473 0.000000 0.0000000 **** p < 1e-04 0.8431142 large
Random 1000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 2498 0.000000 0.0000000 **** p < 1e-04 0.8603487 large
Random 10000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 2478 0.000000 0.0000000 **** p < 1e-04 0.8465611 large
Random 100000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 1620 0.010900 0.2725000 ns p = 0.2725 0.2550713 small
Tournament 10 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 358 0.000000 0.0000000 **** p < 1e-04 0.6149287 large
Tournament 100 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 453 0.000000 0.0000010 **** p < 1e-04 0.5494374 large
Tournament 1000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 1158 0.528000 1.0000000 ns p = 1 0.0634231 small
Tournament 10000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 1857 0.000029 0.0007250 *** p = 0.000725 0.4184549 moderate
Tournament 100000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 1760 0.000444 0.0111000
p = 0.0111 0.3515848 moderate

Mean evolutionary distinctiveness and shannon diversity

res <- data %>% group_by(directory, selection_name) %>%
summarise(
  phen_phylo_10 =      condinformation(
                          discretize(phen_diversity), 
                          discretize(lag(mean_phenotype_evolutionary_distinctiveness, 1)), 
                          discretize(lag(phen_diversity, 1))),
  phen_phylo_100 =     condinformation(
                          discretize(phen_diversity),
                          discretize(lag(mean_phenotype_evolutionary_distinctiveness, 10)),
                          discretize(lag(phen_diversity, 10))),
  pheno_phylo_1000 =   condinformation(
                          discretize(phen_diversity),
                          discretize(lag(mean_phenotype_evolutionary_distinctiveness, 100)),
                          discretize(lag(phen_diversity, 100))),
  pheno_phylo_10000 =  condinformation(
                          discretize(phen_diversity),
                          discretize(lag(mean_phenotype_evolutionary_distinctiveness, 1000)),
                          discretize(lag(phen_diversity, 1000))),
  pheno_phylo_100000 = condinformation(
                          discretize(phen_diversity),
                          discretize(lag(mean_phenotype_evolutionary_distinctiveness, 10000)),
                          discretize(lag(phen_diversity, 10000))),
  
  phylo_pheno_10 =     condinformation(
                          discretize(mean_phenotype_evolutionary_distinctiveness),
                          discretize(lag(phen_diversity, 1)),
                          discretize(lag(mean_phenotype_evolutionary_distinctiveness, 1))),
  phylo_pheno_100 =    condinformation(
                          discretize(mean_phenotype_evolutionary_distinctiveness),
                          discretize(lag(phen_diversity, 10)),
                          discretize(lag(mean_phenotype_evolutionary_distinctiveness, 10))),
  phylo_pheno_1000 =   condinformation(
                          discretize(mean_phenotype_evolutionary_distinctiveness),
                          discretize(lag(phen_diversity, 100)),
                          discretize(lag(mean_phenotype_evolutionary_distinctiveness, 100))),
  phylo_pheno_10000 =  condinformation(
                          discretize(mean_phenotype_evolutionary_distinctiveness),
                          discretize(lag(phen_diversity, 1000)),
                          discretize(lag(mean_phenotype_evolutionary_distinctiveness, 1000))),
  phylo_pheno_100000 = condinformation(
                          discretize(mean_phenotype_evolutionary_distinctiveness),
                          discretize(lag(phen_diversity, 10000)),
                          discretize(lag(mean_phenotype_evolutionary_distinctiveness, 10000)))
)

# Turn Transfer Entropy columns into rows
res <- res %>% pivot_longer(cols=contains("phylo"))
# Pull lag into a column
res$offset <- str_extract(res$name, "[:digit:]*$")
# Make column indicating direction of transfer entropy
res$Type <- case_when(str_detect(res$name, "phylo_pheno") ~ "\nPhenotypic\n\t->\nPhylogenetic\n", TRUE ~ "\nPhylogenetic\n\t->\nPhenotypic\n")

ggplot(
  res, 
  aes(
    x=as.factor(offset), 
    y=value, 
    color=Type
    )
  ) + 
  geom_boxplot() + 
  facet_wrap(~selection_name) + 
  scale_x_discrete("Lag",labels=c("10","","1000","","100000")) + 
  scale_y_continuous("Transfer Entropy") + 
  theme(legend.position = c(1, 0),
        legend.justification = c(1, 0)) + 
  scale_color_discrete("")

# Determine which conditions are significantly different from each other
transfer_entropy_stats(res)
selection_name offset .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif label effsize magnitude
EcoEA 10 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 0 0.00e+00 0.0000000 **** p < 1e-04 0.8617275 large
EcoEA 100 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 0 0.00e+00 0.0000000 **** p < 1e-04 0.8617275 large
EcoEA 1000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 5 0.00e+00 0.0000000 **** p < 1e-04 0.8582806 large
EcoEA 10000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 4 0.00e+00 0.0000000 **** p < 1e-04 0.8589700 large
EcoEA 100000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 191 0.00e+00 0.0000000 **** p < 1e-04 0.7300555 large
Fitness Sharing 10 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 772 9.95e-04 0.0248750
p = 0.024875 0.3295246 moderate
Fitness Sharing 100 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 770 9.48e-04 0.0237000
p = 0.0237 0.3309034 moderate
Fitness Sharing 1000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 796 1.77e-03 0.0442500
p = 0.04425 0.3129794 moderate
Fitness Sharing 10000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 1020 1.14e-01 1.0000000 ns p = 1 0.1585579 small
Fitness Sharing 100000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 1274 8.71e-01 1.0000000 ns p = 1 0.0165452 small
Lexicase 10 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 0 0.00e+00 0.0000000 **** p < 1e-04 0.8617275 large
Lexicase 100 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 0 0.00e+00 0.0000000 **** p < 1e-04 0.8617275 large
Lexicase 1000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 0 0.00e+00 0.0000000 **** p < 1e-04 0.8617275 large
Lexicase 10000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 0 0.00e+00 0.0000000 **** p < 1e-04 0.8617275 large
Lexicase 100000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 8 0.00e+00 0.0000000 **** p < 1e-04 0.8562124 large
Random 10 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 0 0.00e+00 0.0000000 **** p < 1e-04 0.8617275 large
Random 100 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 1 0.00e+00 0.0000000 **** p < 1e-04 0.8610381 large
Random 1000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 1278 8.50e-01 1.0000000 ns p = 1 0.0193027 small
Random 10000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 1246 9.81e-01 1.0000000 ns p = 1 0.0027575 small
Random 100000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 1260 9.48e-01 1.0000000 ns p = 1 0.0068938 small
Tournament 10 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 395 0.00e+00 0.0000001 **** p < 1e-04 0.5894216 large
Tournament 100 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 483 1.00e-07 0.0000032 **** p < 1e-04 0.5287560 large
Tournament 1000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 1114 3.50e-01 1.0000000 ns p = 1 0.0937560 small
Tournament 10000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 1756 4.92e-04 0.0123000
p = 0.0123 0.3488273 moderate
Tournament 100000 value Phenotypic -> Phylogenetic Phylogenetic -> Phenotypic 50 50 1685 2.74e-03 0.0685000 ns p = 0.0685 0.2998812 small